137 views

Uploaded by Ambalika Smiti

Python Data Analysis- Analytics vidhya

- NumPy, SciPy, Pandas, Quandl Cheat Sheet
- Python for Data Science
- Data Analysis From Scratch With - Peters Morgan
- Machine Learning Algorithms Applications
- Python Seaborn Cheat Sheet
- Scikit_Learn_Cheat_Sheet_Python.pdf
- Python Reference Card
- Pandas Cheat Sheet
- Python Matplotlib Cheat Sheet
- Website Scraping With Python
- Python 3 Cheat Sheet
- Matplotlib Cheat Sheet
- scipy_and_numpy.pdf
- Numpy Cheat Sheet
- Pandas Python for Data Science
- Panda Python
- Python 3 Cheat Sheet
- Applied Deep Learning.pdf
- Numpy_Python_Cheat_Sheet.pdf
- Git Cheat Sheet

You are on page 1of 1

in Python USING

NumPy

Pandas

Matplotlib

data operations and

manipulations. It is

extensively used for data

munging and preparation.

Python. This library contains

basic linear algebra functions

Fourier transforms,advanced

random number capabilities.

library offers matplotlib

with a complete 2D support

along with limited 3D graphic

support.

CHEATSHEET

Contents

Data Exploration

2. How to convert a variable to different data type?

3. How to transpose a table?

4. How to sort Data?

5. How to create plots

(Histogram, Scatter, Box Plot)?

6. How to generate frequency tables?

7. How to do sampling of Data set?

8. How to remove duplicate values of a variable?

9. How to group variables to calculate count,

average, sum?

10. How to recognize and treat missing values

and outliers?

11. How to merge / join data set effectively?

Here are some common

functions used to read data

CODE

import pandas as pd

#Import Library Pandas

df = pd.read_csv("E:/train.csv") #I am working in Windows environment

#Reading the dataset in a dataframe using Pandas

print df.head(3) #Print first three observations

Output

CODE

df=pd.read_excel("E:/EMP.xlsx", "Data") # Load Data sheet of excel file EMP

CODE

# Load Data from text file having tab \t delimeter print df

df=pd.read_csv(E:/Test.txt,sep=\t)

- Convert numeric variables to string variables

and vice versa

srting_outcome = str(numeric_input) #Converts numeric_input to string_outcome

integer_outcome = int(string_input) #Converts string_input to integer_outcome

float_outcome = float(string_input) #Converts string_input to integer_outcome

from datetime import datetime

char_date = 'Apr 1 2015 1:20 PM' #creating example character date

date_obj = datetime.strptime(char_date, '% b % d % Y % I : % M % p')

print date_obj

- Data set used

Code

#Transposing dataframe by a variable

df=pd.read_excel("E:/transpose.xlsx", "Sheet1") # Load Data sheet of excel file EMP

print df

result= df.pivot(index= 'ID', columns='Product', values='Sales')

result

Output

CODE

#Sorting Dataframe

df=pd.read_excel("E:/transpose.xlsx", "Sheet1")

#Add by variable name(s) to sort

print df.sort(['Product','Sales'], ascending=[True, False])

Orginal Table

Sorted Table

Histogram

Code

OutPut

#Plot Histogram

import matplotlib.pyplot as plt

import pandas as pd

df=pd.read_excel("E:/First.xlsx", "Sheet1")

#Plots in matplotlib reside within a figure

object, use plt.figure to create new figure

fig=plt.figure()

#Create one or more subplots using

add_subplot, because you can't

create blank figure

ax = fig.add_subplot(1,1,1)

#Variable

ax.hist(df['Age'],bins = 5)

#Labels and Tit

plt.title('Age distribution')

plt.xlabel('Age')

plt.ylabel('#Employee')

plt.show()

Scatter plot

Code

OutPut

object, use plt.figure to create new figure

fig=plt.figure()

#Create one or more subplots using

add_subplot, because you can't

create blank figure

ax = fig.add_subplot(1,1,1)

#Variable

ax.scatter(df['Age'],df['Sales'])

#Labels and Tit

plt.title('Sales and Age distribution')

plt.xlabel('Age')

plt.ylabel('Sales')

plt.show()

Box-plot:

Code

OutPut

sns.boxplot(df['Age'])

sns.despine()

Code

OutPut

import pandas as pd

df=pd.read_excel("E:/First.xlsx", "Sheet1")

print df

test= df.groupby(['Gender','BMI'])

test.size()

100%

0%

Code

OutPut

import numpy as np

import pandas as pd

from random import sample

# create random index

rindex = np.array(sample(xrange(len(df)), 5))

# get 5 random rows from df

dfr = df.ix[rindex]

print dfr

Output

Code

#Remove Duplicate Values based on values

of variables "Gender" and "BMI"

rem_dup=df.drop_duplicates(['Gender', 'BMI'])

print rem_dup

Code

Output

test= df.groupby(['Gender'])

test.describe()

Output

Code

# Identify missing values of dataframe

df.isnull()

Code

#Example to impute missing values in Age by the mean

import numpy as np

#Using numpy mean function to calculate the mean value

meanAge = np.mean(df.Age)

#replacing missing values in the DataFrame

df.Age = df.Age.fillna(meanAge)

Code

# merges df1 and df2 on index

# By changing how = 'outer', you can do outer join.

# Similarly how = 'left' will do a left join

# You can also specify the columns to join instead of indexes, which are used by default.

- NumPy, SciPy, Pandas, Quandl Cheat SheetUploaded byseanrwcrawford
- Python for Data ScienceUploaded bySujeet Omar
- Data Analysis From Scratch With - Peters MorganUploaded byWallisson Carvalho
- Machine Learning Algorithms ApplicationsUploaded byasadfx
- Python Seaborn Cheat SheetUploaded byFrâncio Rodrigues
- Scikit_Learn_Cheat_Sheet_Python.pdfUploaded byrakesharumalla
- Python Reference CardUploaded bymarczucker
- Pandas Cheat SheetUploaded byRoberto
- Python Matplotlib Cheat SheetUploaded bysreedhar
- Website Scraping With PythonUploaded byAnonymous Eg1W468aNv
- Python 3 Cheat SheetUploaded byLaurent Pointal
- Matplotlib Cheat SheetUploaded byTuan Ton
- scipy_and_numpy.pdfUploaded byAndrés Chavarría
- Numpy Cheat SheetUploaded byIan Flores
- Pandas Python for Data ScienceUploaded bychowdamhemalatha
- Panda PythonUploaded byQQ1000
- Python 3 Cheat SheetUploaded byFredrik Johansson
- Applied Deep Learning.pdfUploaded byAmbrose Onueseke
- Numpy_Python_Cheat_Sheet.pdfUploaded byAshish Sharma
- Git Cheat SheetUploaded bykzelda
- NumPy Essentials - Sample ChapterUploaded byPackt Publishing
- Beginners Python Cheat Sheet Pcc AllUploaded byjohn smith
- Jupyter Notebook Cheat SheetUploaded byAnurag Agarwal
- Skansi. S - Introduction to Deep Learning - 2018Uploaded byjeff ostroff
- 12 Useful Pandas Techniques in Python for Data ManipulationUploaded byxwpom2
- Python Cheat SheetUploaded byebookkarthi
- Python Machine LearningUploaded byArjun
- Anaconda CheatSheetUploaded byankursg
- PythonUploaded byJean Lucas
- python refcardUploaded bywuyuntao

- Ignou Duplicate MarksheetUploaded byDileep Guleriya
- Structure of the Indian EconomyUploaded byAmbalika Smiti
- Mps Ast 1st Year 10-11Uploaded byfeeamali1445
- IndustryUploaded byAmbalika Smiti
- Ignou Prospectus HindiUploaded byAmbalika Smiti
- PlanningUploaded byAmbalika Smiti
- 2009 RBI.grade.B.phase 2.PapersUploaded byAmbalika Smiti
- AgricultureUploaded byAmbalika Smiti
- Delhi Metro - Daily Last Train DetailsUploaded byPankaj Vivek
- TaxationUploaded byAmbalika Smiti
- BudgetUploaded byAmbalika Smiti
- jnu_AdmissionAnnouncement2015Uploaded byBuntySingh
- IGNOU_Forms for Use.pdfUploaded byAmbalika Smiti
- Economics Field of Study Code -Ecom(216)(1)Uploaded byAshishKajla
- Banking & FIsUploaded byAmbalika Smiti
- Solid State Physics KittelUploaded byAmbalika Smiti
- Bachan Singh v State of Punjab 1980Uploaded byAmbalika Smiti
- 419788 Geometry CheatsheetUploaded byAmbalika Smiti
- Fibre OpticsUploaded byAmbalika Smiti
- Formulae 0111Uploaded byAmbalika Smiti
- Common Derivatives & IntegralsUploaded bystr8spades
- List of Trigonometric IdentitiesUploaded byAmbalika Smiti
- Calculus Cheat Sheet Limits Definitions Limit at Infinity :Uploaded byapi-11922418
- Physics Final (cheat sheet) with problemsUploaded byRSlipkov
- Algebra Cheat SheetUploaded byDino
- Trig Cheat SheetUploaded byHMaSN
- IntroductionToSpecialRelativityUploaded byAmbalika Smiti
- Math.cheat.sheet Too.cool.and.imp MustUploaded byAmbalika Smiti
- Abstract Algebra Cheat SheetUploaded bylpauling

- Buying behaviour of GoldUploaded byAdnan Ali
- Combined QP - C3 Edexcel.pdfUploaded byYogesh Ganesh
- Nhanes StudyUploaded byChikezie Onwukwe
- Ethical DilemmaUploaded byNikhil Khobragade
- Ecommerce CWUploaded byFrankie Nguyen
- Duration of wrinkle correction following repeat treatment with Juvéderm hyaluronic acid fillers.docxUploaded bysamantha82
- Landscape Viewing vUploaded byPaul Martin
- Employee motivationUploaded byrselva123
- Prelim MathsInFocus EXT1Uploaded byElijah Prado
- A practical instrument to explore patients’ needs in palliative careUploaded byRuxi Ciocarlan
- ch25.collinearUploaded byamisha2562585
- Classical Management ToolsUploaded byRolando Ramos
- 2005_Pop_PlanUploaded byMykel John-Osarenz
- Accessmod Session 2Uploaded byAji Setiawan
- 2016 Jun New-Oracle 1Z0-051 PDF Dumps 303q GetUploaded byRobert Diaz
- ACIAR Project Fish 2001Uploaded byMoriarty
- Iccp 2010 ReportUploaded byfahlevy
- Lecture_notes_Statistics_II.pdfUploaded bySang Huynh
- Implementation of Kenya quality guidelines for improved health services in Kenya: A case of Kakamega CountyUploaded byJASH MATHEW
- AUGI - Data Extraction in AutoCADUploaded byRahul Srivastava
- 1-s2.0-S0195666307003728-main.pdfUploaded byYudhistira Rinasmara Kusuma Adi
- E2520.pdfUploaded byPaulo Malizia
- Strategic SourcingUploaded bymvsrinu
- EigenalgebraUploaded byPerry Harabin
- FEM_PYTHON.pdfUploaded byNikola Andjelic
- Vishal Mega Mart Project reportUploaded byEverett Decker
- Risk Assessment Template 1.2Uploaded byJorge Humberto Herrera
- ImplantesUploaded byDiana Madrid
- 1676.pdfUploaded bymeenakshisundram
- 746125_FULLTEXT01Uploaded byMohinuddin Ahmed