Professional Documents
Culture Documents
(9)WhatclassesshouldItakeifIwanttobecomeadatascientist?Quora
AskQuestion
AskorSearchQuora
LearningDataScience
JobsandCareersinDataScience
DataScience
+1
Read
Answer
Notifications
Pallav
QuestionOverview
WhatclassesshouldItakeifIwanttobecomeadata
scientist?
Giventhereportedtalentgapfordatascientists(http://www.emc.com/collateral/ab... )how
shoulduniversitiesandindustrybetrainingpeople?
WriteAnswer
ReAsk
Follow 409
Comment Share 3
Downvote
AnswerWiki
Morespecificversionsofthisquestionforparticularuniversities.(feelfreetoaddyours!)
WhatclassesshouldItakeatBerkeleyifIwanttobecomeadatascientist?
WhatclassesshouldItakeatBrownifIwanttobecomeadatascientist?
WhatclassesshouldItakeatCaltechifIwanttobecomeadatascientist?
WhatclassesshouldItakeatCMUifIwanttobecomeadatascientist?
WhatclassesshouldItakeatCornellifIwanttobecomeadatascientist?
WhatclassesshouldItakeatDukeifIwanttobecomeadatascientist?
WhatclassesshouldItakeatGeorgiaTechifIwanttobecomeadatascientist?
WhatclassesshouldItakeatHarvardifIwanttobecomeadatascientist?
WhatclassesshouldItakeatMITifIwanttobecomeadatascientist?
WhatclassesshouldItakeatPrincetonifIwanttobecomeadatascientist?
WhatclassesshouldItakeatStanfordifIwanttobecomeadatascientist?
WhatclassesshouldItakeatUCLAifIwanttobecomeadatascientist?
WhatclassesshouldItakeattheUniversityofChicagoifIwanttobecomeadata
scientist?
WhatclassesshouldItakeatUTAustinifIwanttobecomeadatascientist?
WhatclassesshouldItakeatYaleifIwanttobecomeadatascientist?
WhatclassesshouldItakeatIISc(IndianInstituteofScience)ifIwanttobecomea
datascientist?
26Answers
WilliamChen,DataScientistatQuora
22.2kViewsUpvotedbySeanOwen,Director,DataScience@Cloudera
WilliamisaMostViewedWriterinDataScience.
AdatasciencecurriculumshouldmostlyacombinationofStatisticsandComputerScience
classes,withadditionalrelevantclassesfromotherdepartments(e.g.AppliedMath,Math,
Econ)
409FollowersincludingJoeBlitzstein,Professor
intheHarvardStatisticsDepartmentRyanFox
Squire JoeBlitzstein
InFAQforDataScienceatUniversities
ProfessorintheHarvardStatistics
Department
41,024Views
ViewMore
50,723
1,564,719
30DayViews
AllTimeViews
RelatedQuestions
TopWriter2015and2014
MostViewedinStatistics,Probability,
WhatclassesshouldItakeattheUniversityof
HarvardStat110,and9more
ChicagoifIwanttobecomeadatascientist?
FollowedbyMarcBodnick,VladimirNovakovski,
IsittoolatetobecomeadatascientistifI'mtaking
and5othersyoufollow
myfirstprobabilityclassasafreshman?
WhatclassesshouldItakeinmylastsemesterasa
Follow 15.9k
Subscribe
mathmajor/aspiringdatascientist?
WhatclassesshouldItakeatBerkeleyifIwantto
becomeadatascientist?
WhatclassesshouldItakeatUCLAifIwantto
becomeadatascientist?
Iwanttobeadatascientist.Iampursuinga
bachelor'sincomputerscience.WhatshouldIdo
aftermygraduationtobec...
WhatclassesshouldItakeatStanfordifIwantto
becomeadatascientist?
WhatclassesshouldItakeatHarvardifIwantto
becomeadatascientist?
WhatmajorshouldIchooseifIwanttobeadata
scientist?
WhatclassesshouldItakeatPrincetonifIwantto
becomeadatascientist?
MoreRelatedQuestions
Herearemysuggestionsonafullcurriculumforadatascienceprogram
https://www.quora.com/WhatclassesshouldItakeifIwanttobecomeadatascientist
1/9
12/24/2015
(9)WhatclassesshouldItakeifIwanttobecomeadatascientist?Quora
Introduction
Oneyearofmultivariablecalculusandlinearalgebra/matrixalgebra
OneyearofintroCS
Oneyearofintroprobabilityandinference
CoreClasses
Datascience
Machinelearning
Linearmodeling
Predictivemodeling
Statselectives
Morelinearmodels
Timeseriesanalysis
Statisticalsoftware
Experimentaldesign
Surveyanalysis
Causalinference
Bayesiandataanalysis
Nonparametricmethods
CSelectives
Theoryofcomputation/Analysisofalgorithms
Datastructuresandalgorithms
Softwareengineering
Visualization
Parallelprogramming/Massivecomputation(forprocessinghugedatasets)
Networkanalysis
Moremachinelearning
Economics+ComputerScience(gametheory,auctiondesign)
Otherelectives
(convex)Optimization
Behavioraleconomics
Thankyouforthoseinthecommentsforsuggestingmorecommentstoaddtothelist!
Formyansweronwhatmajoryoushouldbeifyouwanttobeadatascientist,checkout
WhatmajorshouldIchooseifIwanttobeadatascientist?
ThisanswerispartofWhatistheDataSciencetopicFAQ?
UpdatedFeb3ViewUpvotes
Upvote 287
Downvote Comments 8+
Share 6
MarkMeloon,SeniorDataScientistatImpetus
2.4kViewsUpvotedbyRyanFoxSquire,NeuroscientistTurnedDataScientist
Markhas30+answersinDataScience.
First,there'sadifferencebetweendevelopingdataproductstobeconsumedbypeople
versusthoseconsumedbyothermachines.ButI'massumingyoumeantheformer,so
that'swhatI'lltalkabout.
Therearealotofgreatanswershere,butIjustwanttohighlightafewaspectsthatdon't
getnearlyasmuchattentionastheyshould.
Causality:Dr.Anonymousbelowdeservesmoreupvotes.Attheendoftheday.
youraudiencewantsactionableinformation.Ifyoudon'tgiveittothem,youhave
failed(moreonthisbelow).We'reinasituationnowwheredatascientistsdo
predictivemodeling(basedonhistoricaldata),decisionmakersactionbasedon
that,andtheresultis...well,nooneknows.Thatactionwasneverinthemodel.
Conversationalskills:ReadGuyCuthbert'sanswer.Hepointsoutavery
important,andwoefullyneglected,setofskills,namelybeingabletohavea
conversationwithnonspecialists.Iwroteabouttheimportanceofthisindetailin
myanswertoMarkMeloon'sanswertoWhatisadatascientist'scareerpath?
Mostdatascientistswhoclaimtobe"greatcommunicators"aremerelyskilledat
https://www.quora.com/WhatclassesshouldItakeifIwanttobecomeadatascientist
2/9
12/24/2015
(9)WhatclassesshouldItakeifIwanttobecomeadatascientist?Quora
onewayinformationtransfer,suchaswritingandpresenting.Thatdoesn'tcutit
indatascienceforreasonsIexplain.Guy'ssuggestionofRhetoricandCognitive
Psychologyisrightonthemoney.IhaveyettoseeauniversityorMOOCdo
anythingbutpaylipservicetothiscriticalaspectofdatascience.
UnderstandData:GreatanswersbyAlexLeavittandAdamMarcus.
MathematicianJohnAllenPauloshasagreatbookentitled"Innumeracy"that
detailsjusthowpoorlymostpeopleunderstandprobabilitiesandother
mathematicalconcepts(seeSynopsesofInnumeracy,MathandHumor,andHis
OtherBooks ).You'vegottobeabletogrokallthisatadeeplevel.
Andnowfornoncoursework,itisaverygoodideatodosomeprojectsofyourown
interesttodemonstrateyourinitiative,passionforthesubject,andthatyouareaself
starter.Notethataclassprojectdoesn'tcount(seeDataScienceInterview:IDon'tCare
AboutYourClassProjects ).Personally,Idon'tparticularlycareaboutKaggle
competitionseither(seeMarkMeloon'sanswertoHowusefulareKagglecompetitionsfor
gettinginterviewsforsomeonealreadyworkingasadatascientist?).I'mmuchmore
interestedinprojectsofyourowndesignandthosethatdemonstrateyourabilitytowork
wellinateamenvironment.
There'smore,ofcourse,buttheothercommentersonthispagehavedoneanexcellentjob
ofcoveringthose.Mybiois"DataScience:thestraight,nohypetruth"andIfeel
compelledtopointoutthatdatascienceisfarmorethansittinginfrontofyourcomputer
allday,geekingoutonusingthemostsophisticatedalgorithmyoucanthinkoftospitup
results.
Finally,goforit!Datascienceiswaycoolandtherereallyisnothingquitelikeit.The
criticismthatit'smerelyasexedupversionofstatisticsiswayoff.Yeah,trainingtobecome
onehasuniquechallenges,butit'llbeworthitintheend.
AndkeepaskingquestionsonQuora.There'saslewofextremelyknowledgablepeople
herewhoareveryeagertohelp!
Mark
WrittenJan31ViewUpvotes
Upvote 12
RahulAgarwal,DataScientistatCiti
2.7kViewsUpvotedbyRyanFoxSquire,NeuroscientistTurnedDataScientist
RahulisaMostViewedWriterinBigData.
IcouldonlytellyouwhatIdidtillnowandwhatIintendtoworkonadditionallyto
becomeabetterdataScientist.
WhatfollowsismyownDatascienceCurriculum.ThisisaimedatComputer
SciencewithaSpecializationinMachineLearning.
MymainaimhereistolearnaboutMathematics,Statistics,ComputerScienceand
MachineLearning,thoughnotnecessarilyinthesameorder.
Ihavecategorizedthecourseshereasoftwotypes:
1. FFoundationalClass
2. AAdvancedSpecialization
MATHEMATICS:
(F1)LinearAlgebraByGilbertStrang:
AGreatClassbyagreatTeacher.IWoulddefinitelyrecommendthisclasstoanyonewho
wantstolearnLA.
(F2)MultivariateCalculusMITOCW: TODO
COMPUTERSCIENCE:
(F1)CS50x:IntroductiontoComputerScience,Harvard
ThisisanIntroductiontoComputerScienceclasstakenbyDavidMalan.Helpedmewith
https://www.quora.com/WhatclassesshouldItakeifIwanttobecomeadatascientist
3/9
12/24/2015
(9)WhatclassesshouldItakeifIwanttobecomeadatascientist?Quora
manymisunderstandingsandhelpedbuildintuitionaroundthewholeCSplayground.
StartswithabasicintroductiontoCandsomeprogrammingexercises.Endsupteaching
basicsofPHP,JavascriptandHTML/CSSaswell.Theprojectsinthisclassarereally
awesome.ThegithubcoderepositoryforthisclassisatHERE
(F2)CS101x:MITxintroductiontoprogrammingusingPython:
Thecourseisanintroductiontomanyoftheimportantconceptsincomputerscience.
Talksaboutsimplealgorithms,Asymptotictimes,Classes,OOP,Trees,Exceptions,
Assertions,Hashingandawholelotofotherstuff.
(F3)AlgorithmsandDataStructuresMITOCW: CURRENTLYWorkingon
(F4)RICEUniversity:CompSciMiniSpecialization
Thisisaseriesof6shortbutgoodcourses.IworkedonthesecoursesasDatascience
willrequireyoutodoalotofprogramming.Andthebestwaytolearnprogrammingisby
doingprogramming.Thelecturesaregoodbuttheproblemsandassignmentsare
awesome.Itconsistsofthreemaincourses:
1>InteractiveProgramminginPython :TheCoursestartswithteachingPythonbut
suddenlymovesintocreatinggraphicaluserinterfacesandgamesusingpythonin
codeskulptor.Icreatedsomeverybasicgamesinthiscourseaspartofthecoursework.
Someofthemare:
GuessTheNumber
StopWatch
Pong
Memory
BlackJack
RiceRocks
2>PrinciplesofComputing :Thiscourseaddsontothepreviouscoursebutherethe
focusismoreonthinkingprogrammaticallyratherthanGUIs.Theprojectsarereallygreat
asthecourseprogresseswithcreatinggames.
SolitaireMancala
2048
TicTacToeUsingMonteCarlo
Yahtzee
CookieClicker
ZombieApocalypse
WordWrangler
TicTacToeUsingMinimax
FifteenPuzzle
3>AlgorithmicThinking: Thiscoursestartswithafocusongraphalgorithmsanddata
structures.ThecodesaresourcedatGithub
STATISTICS:
(F1)Stat110:IntroductiontoProbability:JoeBlitzsteinHarvardUniversity
ConditioningistheSoulofStatistics.
Itookthiscoursetoenhancemyunderstandingofprobabilitydistributionsandstatistics,
butthiscoursetaughtmealotmorethanthat.ApartfromLearningtothink
conditionally,thisalsotaughtmehowtoexplaindifficultconceptswithastory.
ThiswasaHardClassbutdefinitelyfun.Thefocuswasnotonlyongetting
Mathematicalproofsbutalsoonunderstandingtheintuitionbehindthemandhow
intuitioncanhelpinderivingthemmoreeasily.Sometimesthesameproofwasdonein
differentwaystofacilitatelearningofaconcept.
OneofthethingsIlikedmostaboutthiscourseisthefocusonconcreteexampleswhile
explainingabstractconcepts.TheinclusionofGamblersRuinProblem,Matching
Problem,BirthdayProblem,MontyHall,SimpsonsParadox,St.Petersberg
Paradoxetc.madethiscoursemuchmuchmoreexcitingthananormalStatistics
Course.
https://www.quora.com/WhatclassesshouldItakeifIwanttobecomeadatascientist
4/9
12/24/2015
(9)WhatclassesshouldItakeifIwanttobecomeadatascientist?Quora
IwilldefinitelybeonalookoutformorecoursesbyJoeafterthisandIhavealreadydone
onemorecoursebyhimCS109.Moreonthatlater.
TheTop10Ideascoveredinthisclassare:
1. Probability,ConditioningisthesoulofStatistics,StoryProofs
2. BayesTheorem,LawofTotalProbability,FirstStepAnalysis.
3. ExpectationandVariancefordiscreteRVsandcontinuousRVs.LOTUS.
4. Discrete(Bernoulli,Binomial,Hypergeometric,Geometric,NegativeBinomial,FS,
Poisson)andContinuous(Uniform,Normal,expo,Beta,Gamma)Distributions
andthestoriesbehindthem.
5. MomentGeneratingFunctions(MGFs)andtheirProperties
6. JointandMarginaldistributions,CovarianceandCorrelation
7. ConvolutionsandTransformations
8. ConditionalExpectationAdamandEveLaw
9. LawofLargeNumbersandCLT
10. MarkovChains
Solvingtheproblemsetsandthemidtermreviewshelpedmealotingraspingtheabstact
concepts.
(F2)Stat111 :TODO
UsesDegrootandSchervishforinstruction.NolecturevideosavailablesoIplantoread
thebookandCompleteProblemSetsOnlinefromtheStat111website.Isowishthelectures
werethere.
(A1)BayesianStatisticsSTAT544: TODO
AlectureSeriesonBayesianstatisticsbyJaradNiemiatISU.
(A2)DiscreteStochasticProcessesMITOCW: TODO
GothighlyinterestedinProbabilityafterSTAT110soaddedthishere.Itisanalternativeto
oneofthenextcoursestotakeafterSTAT110thatProfessorJoeBlitzsteintalksaboutin
thecourseapartfromSTAT111.
MACHINELEARNING:
(F1)MITxTheAnalyticsEdge:
ThisisafantasticcourseforlearningaboutRaswellastheimplementationsofvarious
machinelearningalgorithminR.VeryBasic.VeryCrispandveryinformative.The
scenariosandexamplesrangefromMoneyballtoWatson.Theonlyproblemwiththis
courseisthatitsproblemsetsfeelalittlerepetitive.
Hereisthelocation ofmyRcoderepositoryforthiscourse
(F2)IntrotoDataScienceUniversityofWashington
MyfirstMLClass.Ittookalittlebitlongtograsptheconceptsbutinhindsghtitmightbe
becauseofmylackofexposuretothematerial.ItwasmyfirstgrapplewithtoolslikeR
andPython.CoversawholelotofbasefromRtoPythontoMapreduce.Wouldputithere
asitgivesathoroughperspectiveofthewholedatasciencespace.
(F3)DataScienceCS109 :AgainbyProfessorBlitzstein.Againanawesomecourse.
WatchitafterStat110asyouwillbeabletounderstandeverythingmuchbetterwitha
thoroughgrindinginStat110concepts.YouwilllearnaboutPythonLibrariesfordata
science,alongwithathoroughintuitivegrindingforvariousMachinelearningAlgorithms.
CoursedescriptionfromWebsite:
Learningfromdatainordertogainusefulpredictionsandinsights.Thiscourse
introducesmethodsforfivekeyfacetsofaninvestigation:datawrangling,cleaning,and
samplingtogetasuitabledatasetdatamanagementtobeabletoaccessbigdata
quicklyandreliablyexploratorydataanalysistogeneratehypothesesandintuition
predictionbasedonstatisticalmethodssuchasregressionandclassificationand
communicationofresultsthroughvisualization,stories,andinterpretablesummaries.
(A1)CS229:AndrewNg:
ContainsthemathsbehindmanyoftheMachineLearningalgorithms.TheGameChanger
https://www.quora.com/WhatclassesshouldItakeifIwanttobecomeadatascientist
5/9
12/24/2015
(9)WhatclassesshouldItakeifIwanttobecomeadatascientist?Quora
machinelearningcourse.Iwillputthiscourseasnumerounoasthiscoursemotivatedme
intogettinginthisfieldandAndrewNgisagreatinstructor.
DISTRIBUTEDANDPARALLELCOMPUTING:
(A1)IntrotoHadoop&MapreduceUdacity
VeryEasyCourse.TaughttheFundamentalsofHadoopstreamingwithPythontakenby
ClouderaonUdacity.IamdoingmuchmoreadvancedstuffwithpythonandMapreduce
nowbutthisisoneofthecoursesthatlaidthefoundationthere.
(A2)BerkeleyX:IntroductiontoBigDatawithApacheSpark and(A3)
BerkeleyX:CS190.1xScalableMachineLearning
Amightyflamefollowethatinyspark.
ThisisaseriesofcoursesinSparktaughtbyAnthonyD.Joseph ,aProfessorinElectrical
EngineeringandComputerScienceatUCBerkeleyandAmeetTalwalkar ,awellknown
nameinSparkcommunity.
Thiscoursedeliversonwhatitsays.ItteachesSpark.Totalbeginnerswillhavedifficulty
followingthecourseasthecourseprogressesveryfast.Thatsaidanyonewithadecent
understandingofhowbigdataworkswillbeOK.
Thetopideascoveredinthiscourseare:
1. RDDTransformations(map,flatmap,filter,distinct,groupByKey,sortByKey,
reduceByKey)
2. RDDActions(reduce,takeOrdered,take,collect)
3. AccumulatorandBroadCastVariables
4. DataframeinpySpark
5. SQLonpairedRDDsleftOuterJoin,rightOuterJoin,fullOuterJoin
IcertainlylikedtheMiniProjectsintheclass:
1. WordcountinSpark Awordcountingprogramtocountthewordsinallof
Shakespearesplays
2. ApacheLogFileanalysisinSpark UseSparktoexploreNASAApacheweb
serverlog
3. EntityResolution EntityResolutionusingTFIDFapproachesinSpark.
4. MovieRecommendationusingALS PredictingMovieratingsusingSpark.
5. LinearRegression PredictingSongYearusingLinearregressioninSpark.
6. LogisticRegression PredictingClickThroughRatesusingSpark.OneHot
Encoding,HashingExplained.
7. PCA RunningPCAonneurosciencedata
Someofthecoursesheremayseemrepetitivebuttheyallhaveprovidedsomesortof
additionalskillsthereforeIhaveputthemhere.
IwillupdatethisanswerformoredetailsasIcompletetheTODOcoursesonthelist.
HopethatHelps:)
WrittenDec17ViewUpvotes
Upvote 74
Downvote Comments 5+
Share 4
AdamMarcus,taughta6daydataliteracycourse
5.8kViewsUpvotedbyWilliamChen,DataScientistatQuora
IwanttoechosomethingJosephAdlermentionedattheendofhisanswer:thethingthat
evenacademicallywellequippedstudentswillhavenotbeenexposedtoisthetoolbox
requiredtotriageandprocessahunkofrawdatatheyacquirefromsomesource.
Sprinklinginrealworlddatasetsanddatacleaningexperienceiskeytoacurriculumin
datascience.
EugeneWuandIrecentlytaughta6day(3hoursperday)courseondataliteracybasics
targetedatcomputerscienceundergraduates[1].Ourinitialmotivationwasselfish:as
databasesresearchers,wedidn'thavealotofexperiencewithanendtoendrawdata
>dataproductpipeline.Afterafewtrialrunsofourown,werealizedcertaindata
https://www.quora.com/WhatclassesshouldItakeifIwanttobecomeadatascientist
6/9
12/24/2015
(9)WhatclassesshouldItakeifIwanttobecomeadatascientist?Quora
processingpatternskeptshowingup,andsawthatwehadasmallcourseworthofcontent
onourhands.Theimportantthinghereisthatevenwithundergraduateandgraduate
levelmachinelearning,statistics,anddatabasecoursesunderourbelts,westillhadalotto
learnaboutworkingwithhonesttogoodnessdirtydata.
Eachmoduleofourcoursecouldhavehadanentiresemesterdedicatedtoit,andsowe
favoredbasicskillswithlotsofhandsonexperienceoverintellectualdepthandrigor.We
keptlecturesto2030minutes,givingstudentstheremaining2.5hourstogothroughthe
labswesetupwhilewewalkedaroundansweringquestions.Lecturesallowedstudentsto
knowwhattheywereinforatahighlevel,andthelabportionallowedthemtocement
thoseconceptswithrealdatasets,code,anddiagrams.Allofthecoursecontentis
availableat[1],andhereisadirectlinktoday1'slab[2].
Thesyllabuswecoveredwas:
Day1:anendtoendexperienceindownloadingcampaigncontributiondata
fromthefederalelectioncommission,cleaningitup,andprogrammatically
displayingitusingbasiccharts.
Day2:visualization/chartingskillsusingelectionandcountyhealthdata.
Day3:statisticstotakethehunchestheygotonday2andquantifythem,
learningaboutTTestsandlinearregressionalongtheway.
Day4:textprocessing/summarizationusingtheEnronemailcorpus.
Day5:MapReducetoscaleupDay4'sanalysisusingElasticMapReduceon
AmazonWebServices.Thisfeltabitforced,butthestudentswereclamoringfor
distributeddataprocessingexperience.
Day6:thestudentsteachussomethingtheylearnedontheirowndatasetsusing
techniqueswe'vetaughtthem.
Whilewesetouttogivecomputersciencestudentswithfamiliarityinpython
programmingadiveintodata,weendedupwithfolksfromthephysicalsciences,doctors,
andafewsocialscientistswhohadtheirowndatasetstoanswerquestionsabout.Thelast
dayallowedthemtoexperimentwiththeirnewskillsontheirowndata.Attendanceon
thisdaywaslowerthanthepreviousdays:themajorityofthefolksinattendanceonday6
wereonthemoreexperiencedend,andIsuspectthattheundergrads,whowerenotyet
exposedtodataproblemsoftheirown,didn'tfinditasengaging.Itwouldbeinteresting
toseehowtodevelopcoursecontentthatallowsselfdirecteddatascienceforstudents
whostillneedabitmoreinspiration.
Ishouldalsosaythatourattemptisnotthefirstonetobringdatatotheclassroom.Jeff
HammerbacherandMikeFranklinatBerkeleyhaveawonderfulsemesterlengthcourse
ondatascience[3].Thehighleveloutlineofthecourseseemssimilar,buttheygetfarther
intodataproductdesign,andjumpintoeachtopicinmoredepth.Theirresourcespage
[4]hasanicesetoflinkstoothereducationaleffortsworthcheckingout.
[1]http://dataiap.github.com/dataiap/
[2]http://dataiap.github.com/dataia...
[3]http://datascienc.es/
[4]http://datascienc.es/resources/
WrittenApr11,2012ViewUpvotes
Upvote 35
JosephAdler,DataScientistatLinkedIn,O'ReillyAuthor
7.1kViewsUpvotedbyRobertChang,DataJanitor@Twitter|TaiwaneseAmerican|
Statisticallyeducated|AspiringsingerJamesPitt1otheryoufollow
Today,weusetheterm"datascience"tomean"doingstuffwithdata."Somedata
scientistsbuildproducts,someoptimizebusinesses,otherstrytounderstandbusinesses.
Regardlessofwhatadatascientistdoes,therearethreethingsthatadatascientistneeds
tounderstandtobeeffective:
(1)Math
(2)ComputerScience
(3)Theproblemthatheorsheissolving
https://www.quora.com/WhatclassesshouldItakeifIwanttobecomeadatascientist
7/9
12/24/2015
(9)WhatclassesshouldItakeifIwanttobecomeadatascientist?Quora
Letmeexplainalittlemoreabouteachone.
(1)Math.Whetheryouhavealotofdataoralittlebit,you'regoingtohavetousesome
mathtomakesenseofit.Mathhelpsyoufindpatternsindataanddetermineifthose
patternsaremeaningful.Inpractice,thismeansadatascientistneedstoknowsome
statisticsandmachinelearning.It'shelpfultoknowsomealgebra,signalprocessing,and
topologyaswell.(Seriously.)
(2)ComputerScience.Today,almostallthedatathatyouencounterwillbegeneratedby
andstoredoncomputers.Often,you'llhavetoshrinkthatdata,cleanitup,orcombineit
withotherdata.Sometimes,you'llhavesomuchdatathatyoucan'tsolveyourproblem
quickly.Inordertoworkwithdata,you'llhavetoknowhowtoprogramacomputer.But
inordertocopewithlargeamountsofdata,you'llneedtoknowaboutcomputer
architectureandalgorithms.Youmayevenhavetoworkwithdatathat'sstoredinacloud
orprocessedonadistributedsystem.I'drecommendthatanydatascientistlearnthe
basicsofsoftwareengineering,algorithms,andcomputerarchitecture.
(3)Theproblemthatheorsheissolving.Ifyouunderstandtheproblemyouaretryingto
solve,andthedatathatyouaretryingtouse,youwillbeabletodistinguishanswersthat
makesensefromanswersthatdonot,thinkofnoveldatasourcestolookfor,andthinkof
newwaystosolveproblems.Don'tunderestimatetheimportanceofunderstanding
economics,physics,biology,orhumanpsychologywhenyou'retacklingaproblem.In
practice,I'drecommendthatadatascientistshouldhavesometrainingineconomics
(specificallyeconometricsandgametheory),butanyscientifictrainingishelpful.
Andfinally,Iwouldn'tunderestimatethevalueofexperience.There'salotofstuffthatI've
learnedthehardwayaboutcleaningdata,runningexperiments,andimplementing
solutions.Academictrainingisagreatstart,buttherealworldiscomplicatedandchanges
quickly.Anygoodtrainingprogramneedstoincludesomebig,handsonprojectswithreal
worlddata(notcleantoydatasets).
WrittenApr11,2012ViewUpvotes
Upvote 68
Downvote Comments 4+
Share 1
GuyCuthbert,DataAnimator,https://uk.linkedin.com/in/guycuthbert
2kViewsUpvotedbyAnkitSharma,DataScientistatDataRPM
Lotsofgreatanswershereonthetechnicalstuffgreat,buttoomanygraduatedata
scientists(andvariantsonthatthemestatisticians,dataprogrammers,dataanalystsetc.)
areunabletocommunicatetheirfindingseffectively.Thisisarecurringthemeinmy
experience(seeSkillsforBigData? ),soIwouldsuggestthatinadditiontosolid
maths&computerscienceskillsthatyoushouldadd:
Datavisualisation(evenlightgraphicdesignprinciples)
Rhetoric(yes,I'mserious!)
Cognitivepsychology(stillserious)
Thosemaysoundoddsuggestions,butanelementofallthreemakesahugedifferencean
effectivedatascientistshouldbeabletoexplainfindingsinawaythattheaudience
understands.
Forthe1%whoonlyneedtocommunicatewithengineers,you'refinewithyourstatistics
andmathsproofs...fortherestofus,theaudiencewillconsistofbusinesspeoplewhowant
tounderstandenoughofyourfindings,withconfidenceinyourmethod,totakesome
formofcorrectiveaction.
Inordertocommunicateeffectivelywiththiskindofaudienceyouneedtobeastoryteller,
abletoexplain:
Whatdatayouusedintheirterminology(requiringyoutohavesomedomain
expertise)
Howyouexploredthatdataanddiscoveredinterestingpatterns(visualisation
helpsmassivelyhere)
Whyyoubelievethatyourfindingsareimportant(rhetoricalskillhelpsyoushape
andpersuadeyouraudience,focusingontheirneedsnotyours)
https://www.quora.com/WhatclassesshouldItakeifIwanttobecomeadatascientist
8/9
12/24/2015
(9)WhatclassesshouldItakeifIwanttobecomeadatascientist?Quora
Aboveall,youneedtoensurethatyouraudiencelearnsfromyourstoryandactsuponit
soalittlecognitivepsychologywillhelpyouexplaintotheaudiencetheirnaturalbiases,
howtodetectandavoidfalsepatterns,andwillcertainlyhelpyoushapevisualisations
whichconveythemessageyouintendtodeliver.
WrittenMar9,2013ViewUpvotes
Upvote 17
Downvote Comment 1
Share
Wanttohelpotherslearnmore?
AskaQuestion
TopStoriesfromYourFeed
SwatiTiwari
thisDec18
XuBeixiand5moreupvoted
Howdoweovercometheregret
feelingofwastedyears?
DashdikpalNandeshwarand
DeepakupvotedthisDec18
KaoreOmkar
HowdoItellmybestfriendI'min
lovewithher?
AarushiRuddra,Doctorinprocess
ShreyasiBiswas,Student
135.2kViewsUpvotedbyRupalVerma
ShubhaHazra5othersyoufollow
50.3kViewsUpvotedbyVinitaPunjabi,
C.A.AspirantKaoreOmkarDeepak
DashdikpalNandeshwar
MyMomgavemetwopacketsofbiscuits
priortothejourney.Iateonebiscuitfrom
onepackandletitremainopenfortherest
ofthejourney.Towardstheend,Ihadone
wastedstalepackandone...
ReadInFeed
I'dliketotellyouabouttwostories,inbrief.
Mybestfriendfellinlovewithme6months
ago.HoweverIdidnotfeelthesame,butwe
continuedbeingbestfriendsuntilthingsgot
messedupan...
ReadInFeed
https://www.quora.com/WhatclassesshouldItakeifIwanttobecomeadatascientist
SandhyaRamesh
BalaSenthilKumarand
1moreupvotedthis3am
Whoistheoldestknownpersonin
thehistoryofmankindwithavalid
proofoftheirage?
CarlosMatiasLaBorde,Software
developer,artist,occassional
entrepreneur
89kViewsUpvotedbySandhyaRamesh
GwenSawchuk1otheryoufollow
ReadInFeed
9/9