Professional Documents
Culture Documents
DiscoverInsightsfromData
BeforeYouStart
Prerequisites:ThankyouforyourinterestintheDataAnalystNanodegree!Inordertosucceedinthis
program,werecommendhavingexperienceprograminginPython.Ifyouveneverprogrammedbefore,or
wantarefresher,thereisanIntroductiontoPythonProgrammingintheextracurricularsectionofthe
nanodegreeprogram.
EducationalObjectives:Learntoorganizedata,uncoverpatternsandinsights,makepredictionsusing
machinelearning,andclearlycommunicatecriticalfindings.
LengthofProgram*:260Hours
FrequencyofClasses:Self-paced
Textbooksrequired:None
InstructionalToolsAvailable:Videolectures,1:1appointments,forumsupport
*Thisisaself-pacedprogramandthelengthisanestimationoftotalhourstheaveragestudentmaytaketo
completeallrequiredcoursework,includinglectureandprojecttime.Actualhoursmayvary.
IntroProject:AnalyzeBayAreaBikeShareData(10hrs)
Thisprojectwillintroduceyoutothekeystepsofthedataanalysisprocess.Youlldosobyanalyzingdata
fromabikesharecompanyfoundintheSanFranciscoBayArea.Youllsubmitthisprojectinyourfirst7
days,andbytheendyoullbeableto:
UsebasicPythoncodetocleanadatasetforanalysis
Runcodetocreatevisualizationsfromthewrangleddata
Analyzetrendsshowninthevisualizationsandreportyourconclusions
Determineifthisprogramisagoodfitforyourtimeandtalents
Project:ComputeStatisticsfromCardDraws(20hrs)
Inthisproject,youwilldemonstrateyourknowledgeofdescriptivestatisticsbyconductinganexperiment
dealingwithdrawingfromadeckofplayingcardsandcreatingawrite-upcontainingyourfindings.This
projectisself-graded.
SupportingLessonContent:Statistics
LessonTitle LearningOutcomes
INTROTORESEARCH Identifyseveralstatisticalstudymethodsanddescribethe
METHODS positivesandnegativesofeach
VISUALIZINGDATA Createandinterprethistograms,barcharts,andfrequencyplots
CENTRALTENDENCY Computeandinterpretthe3measuresofcenterfor
distributions:themean,median,andmode
VARIABILITY Quantifythespreadofdatausingtherangeandstandard
deviation
Identifyoutliersindatasetsusingtheinterquartilerange
STANDARDIZING Convertdistributionsintothestandardnormaldistribution
usingtheZ-score
Computeproportionsusingstandardizeddistributions
NORMALDISTRIBUTION Usenormaldistributionstocomputeprobabilities
UsetheZ-tabletolookuptheproportionsofobservations
above,below,orinbetweenvalues
SAMPLING Applytheconceptsofprobabilityandnormalizationtosample
DISTRIBUTIONS datasets
Project:InvestigateaDataset(30hrs)
Inthisproject,youllchooseoneofUdacity'scurateddatasetsandinvestigateitusingNumPyandpandas.
Youllcompletetheentiredataanalysisprocess,startingbyposingaquestionandfinishingbysharingyour
findings.
SupportingLessonContent:IntroductiontoDataAnalysis
LessonTitle LearningOutcomes
DATAANALYSISPROCESS Identifythekeystepsinthedataanalysisprocess
CompleteananalysisofUdacitystudentdatausingpurePython,
withminimalrelianceonadditionallibraries
NUMPYANDPANDAS UseNumPyarrays,pandasseries,andvectorizedoperationsto
FOR1DDATA easethedataanalysisprocess
NUMPYANDPANDAS Usetwo-dimensionalNumPyarraysandpandasDataFrames
FOR2DDATA Understandhowtogroupdataandtocombinedatafrom
multiplefiles
Project:WrangleOpenStreetMapData(60hrs)
Inthisproject,youllusedatamungingtechniques,suchasassessingthequalityofthedataforvalidity,
accuracy,completeness,consistencyanduniformity,tocleantheOpenStreetMapdataforapartofthe
worldthatyoucareabout.
SupportingLessonContent:DataWranglingwithSQL
LessonTitle LearningOutcomes
DATAEXTRACTION Properlyassessthequalityofadataset
FUNDAMENTALS UnderstandhowtoparseCSVfilesandXLSwithXLRD
UseJSONandWebAPIs
DATAINMORECOMPLEX UnderstandXMLdesignprinciples
FORMATS ParseXML&HTML
Scrapewebsitesforrelevantdata
DATAQUALITY Understandcommonsourcesfordirtydata
Measurethequalityofadataset&applyablueprintforcleaning
Properlyauditvalidity,accuracy,completeness,consistency,and
uniformityofadataset
ANALYZINGDATA Identifycommonexamplesoftheaggregationframework
Useaggregationpipelineoperators$match,$project,$unwind,
$group
SQLFORDATAANALYSIS UnderstandhowdataisstructuredinSQL
Runqueriestosummarizedata
Usejoinstocombineinformationacrosstables
Createtablesandimportdatafromcsv
CASESTUDY: Useiterativeparsingforlargedatafiles
OPENSTREETMAPDATA UnderstandXMLelementsinOpenStreetMap
Project:ExploreandSummarizeData(50hrs)
Inthisproject,youlluseRandapplyexploratorydataanalysistechniquestoexploreaselecteddatasetfor
distributions,outliers,andanomalies.
SupportingLessonContent:DataAnalysiswithR
LessonTitle LearningOutcomes
WHATISEDA? Defineandidentifytheimportanceofexploratorydataanalysis
(EDA)
RBASICS InstallRStudioandpackages
WritebasicRscriptstoinspectdatasets
EXPLOREONEVARIABLE Quantifyandvisualizeindividualvariableswithinadataset
Createhistogramsandboxplots
Transformvariables
Examineandidentifytradeoffsinvisualizations
EXPLORETWOVARIABLES Properlyapplyrelevanttechniquesforexploringtherelationship
betweenanytwovariablesinadataset
Createscatterplots
Calculatecorrelations
Investigateconditionalmeans
EXPLOREMANY Reshapedataframesanduseaestheticslikecolorandshapeto
VARIABLES uncoverinformation
DIAMONDSANDPRICE Usepredictivemodelingtodetermineagoodpricefora
PREDICTIONS diamond
Project:TestaPerceptualPhenomenon(20hrs)
Inthisproject,youllusedescriptivestatisticsandastatisticaltesttoanalyzetheStroopeffect,aclassic
resultofexperimentalpsychology.Communicateyourunderstandingofthedataandusestatistical
inferencetodrawaconclusionbasedontheresults.
SupportingLessonContent:InferentialStatistics
LessonTitle LearningOutcomes
ESTIMATION Estimatepopulationparametersfromsamplestatisticsusing
confidenceintervals
HYPOTHESISTESTING Usecriticalvaluestomakedecisionsonwhetherornota
treatmenthaschangedthevalueofapopulationparameter
T-TESTS Testtheeffectofatreatmentorcomparethedifferencein
meansfortwogroupswhenwehavesmallsamplesizes
Project:IdentifyFraudfromEnronEmail(50hrs)
Inthisproject,youllplaydetectiveandputyourmachinelearningskillstousebybuildinganalgorithmto
identifyEnronemployeeswhomayhavecommittedfraudbasedonthepublicEnronfinancialandemail
dataset.
SupportingLessonContent:IntroductiontoMachineLearning
LessonTitle LearningOutcomes
SUPERVISED ImplementtheNaiveBayesalgorithmtoclassifytext
CLASSIFICATION ImplementSupportVectorMachines(SVMs)togeneratenew
featuresindependentlyonthefly
Implementdecisiontreesasalaunchingpointformore
sophisticatedmethodslikerandomforestsandboosting
DATASETSAND WrestletheEnrondatasetintoamachine-learning-readyformat
QUESTIONS inpreparationfordetectingcasesoffraud
REGRESSIONSAND Useregressionalgorithmstomakepredictionsandidentifyand
OUTLIERS cleanoutliersfromadataset
UNSUPERVISED Usethek-meansclusteringalgorithmforpattern-searchingon
LEARNING unlabeleddata
FEATURES,FEATURES, Usefeaturecreationtotakeyourhumanintuitionandchange
FEATURES rawfeaturesintodataacomputercanuse
Usefeatureselectiontoidentifythemostimportantfeaturesof
yourdata
Implementprincipalcomponentanalysis(PCA)foramore
sophisticatedtakeonfeatureselection
Usetoolsforparsinginformationfromtext-typedata
VALIDATIONAND Implementthetrain-testsplitandcross-validationtovalidate
EVALUATION andunderstandmachinelearningresults
Quantifymachinelearningresultsusingprecision,recall,andF1
score
Project:MakeanEffectiveVisualization(20hrs)
Inthisproject,youllcreateadatavisualization,usingTableau,fromadatasetthattellsastoryorhighlights
trendsorpatternsinthedata.Yourworkshouldbeareflectionofthetheoryandpracticeofdata
visualization,harnessingvisualencodingsanddesignprinciplesforeffectivecommunication.
SupportingLessonContent:DataVisualizationwithTableau
LessonTitle LearningOutcomes
DATAVISUALIZATION Understandtheimportanceofdatavisualization
FUNDAMENTALS Knowhowdifferentdatatypesareencodedinvisualizations
DESIGNPRINCIPLES Selectthemosteffectivechartorgraphbasedonthedata
beingdisplayed
Usecolor,shape,size,andotherelementseffectively
CREATING BecomeproficientinbasicTableaufunctionality,including
VISUALIZATIONSWITH charts,filters,hierarchies,etc.
TABLEAU CreatecalculatedfieldsinTableau
TELLINGSTORIESWITH CreateTableaudashboardsandstoriestoeffectively
TABLEAU communicatedata