You are on page 1of 8

link

https://www.guru99.com/r-data-frames.html

# First way to declare a variable: use the `<-`


name_of_variable <- value

# Second way to declare a variable: use the `=`


name_of_variable = value

# Numerical
num <- c(1,2,222,122)
num

R VECTOR

R vector. ... Vector is a basic data structure in R. It contains element of the


same type.
The data types can be logical, integer, double, character, complex or raw.
A vector's type can be checked with the typeof() function.
Another important property of a vector is its length.

What is the difference between list and vector in R?


Vectors (one-dimensional array): can hold numeric, character or logical values.
The elements in a vector all have the same data type. ... To summarize,
the main difference between list and vector is that list in R allows you to gather
a variety of objects under one name (that is, the name of the list) in an ordered
way.

FACTOR VS VECTOR

Factors in R are stored as a vector of integer values with a corresponding set of


character values to use when the factor is displayed.
The factor function is used to create a factor. ...
Both numeric and character variables can be made into factors, but a factor's
levels will always be character values.

HERE c is combining the values in list or vectors etc.,


without giving c we can't give the words in list.
i can't add multiple numbers and character in list without c (combine) function.

ERROR without giving the c means combine.

> quantity <- (100,200,300,500)


Error: unexpected ',' in "quantity <- (100,"

# Character
char <- c ("S","U","R","E","N")
char

# BOOlean
boo <- c("True","False","True")
boo

x <- (28.0)
class(x)

y <- (12)

x-y
# Create the vectors
vec_num <- c(1,5,5,12)
vec_num1 <- c(2,2,5,3)

# Take the sum of A_vector and B_vector


sum_res <- vec_num + vec_num1
sum_res

# Slice the first five rows of the vector


slice_num <- c(1,2,3,4,5,6,7,8,9,10)

slice_num[1:5]
slice_num[5]

# Faster way to create adjacent value


c(1:8)

# Arithmetic Operators

3+4
100-99
3*5
10/2
10%%20
2^10

# Logical operator
Logical_vec <- c(1:15)
Logical_vec>6

# Print value strictly above 5


Logical_vec[(Logical_vec>6)]

# Print value 5 & 6 alone


Logical_vec[(Logical_vec>4) & (Logical_vec<7)]

What is a Matrix?
A matrix is a 2-dimensional array that has n number of rows and n number of
columns.
In other words, matrix is a combination of two or more vectors with the same data
type.

Note: It is possible to create more than two dimensions arrays with R.

Arguments:

data: The collection of elements that R will arrange into the rows and columns of
the matrix \

nrow: Number of rows

ncol: Number of columns

byrow: The rows are filled from the left to the right. We use `byrow = FALSE`
(default values),
if we want the matrix to be filled by the columns i.e. the values are filled
top to bottom.
# Construct a matrix with 5 rows that contain the numbers 1 up to 10 and byrow =
TRUE

> matrix_a <-matrix(1:10, byrow=TRUE, nrow=5,ncol=5)


> matrix_a
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 1 2 3 4 5
[4,] 6 7 8 9 10
[5,] 1 2 3 4 5

> matrix_a <-matrix(1:30, byrow=TRUE, nrow=5,ncol=5)


> matrix_a
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
[4,] 16 17 18 19 20
[5,] 21 22 23 24 25

# Construct a matrix with 5 rows that contain the numbers 1 up to 10 and byrow =
FALSE

> matrix_a <-matrix(1:30, byrow=FALSE, nrow=5,ncol=5) // byrow = FALSE


> matrix_a
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25

# Print dimension of the matrix with dim()


> dim(matrix_a) // show the no.of rows
and columns
[1] 5 5

# concatenate c(1:5) to the matrix_a


> matrix_a1 <- cbind(matrix_a, c(1:5))
> matrix_a1
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 6 11 16 21 1
[2,] 2 7 12 17 22 2
[3,] 3 8 13 18 23 3
[4,] 4 9 14 19 24 4
[5,] 5 10 15 20 25 5

> dim(matrix_a1)
[1] 5 6

Warning message:
In matrix(1:60, byrow = FALSE, ncol = 8, nrow = 8) :
data length [60] is not a sub-multiple or multiple of the number of rows [8]

> matric_1 <- matrix(1:60, byrow=FALSE, ncol=10, nrow = 10)


> matric_1
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 11 21 31 41 51 1 11 21 31
[2,] 2 12 22 32 42 52 2 12 22 32
[3,] 3 13 23 33 43 53 3 13 23 33
[4,] 4 14 24 34 44 54 4 14 24 34
[5,] 5 15 25 35 45 55 5 15 25 35
[6,] 6 16 26 36 46 56 6 16 26 36
[7,] 7 17 27 37 47 57 7 17 27 37
[8,] 8 18 28 38 48 58 8 18 28 38
[9,] 9 19 29 39 49 59 9 19 29 39
[10,] 10 20 30 40 50 60 10 20 30 40

> matric_1 <- matrix(1:60, byrow=FALSE, ncol=10, nrow = 6)


> matric_1
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 7 13 19 25 31 37 43 49 55
[2,] 2 8 14 20 26 32 38 44 50 56
[3,] 3 9 15 21 27 33 39 45 51 57
[4,] 4 10 16 22 28 34 40 46 52 58
[5,] 5 11 17 23 29 35 41 47 53 59
[6,] 6 12 18 24 30 36 42 48 54 60

========================================================ERROR======================
===========================================================
> matric_1 <- matrix(1:30, byrow=FALSE, ncol=5)
> matric_d <- cbind(matrix_a1, matric_1)
Error in cbind(matrix_a1, matric_1) :
number of rows of matrices must match (see arg 2)

SOLVED

While combining the two vector into one vector the no.of rows and columns should be
same.
See the above error.

> matric_1 <- matrix(1:30, byrow=FALSE, ncol=5, nrow = 5)


> matric_d <- cbind(matrix_a1, matric_1)
> matric_d
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 1 6 11 16 21 1 1 6 11 16 21
[2,] 2 7 12 17 22 2 2 7 12 17 22
[3,] 3 8 13 18 23 3 3 8 13 18 23
[4,] 4 9 14 19 24 4 4 9 14 19 24
[5,] 5 10 15 20 25 5 5 10 15 20 25

> dim(matric_d)
[1] 5 11

FACTOR

Types
Categoraical
Non-Categorical
Contiuous Variables

R - Factors. Factors are the data objects which are used to categorize the data and
store it as levels.
They can store both strings and integers.
They are useful in the columns which have a limited number of unique values.

Factors in R. ... Factors in R are stored as a vector of integer values with a


corresponding set of character values to use when the factor is displayed.
The factor function is used to create a factor.
The only required argument to factor is a vector of values which will be returned
as a vector of factor values.

# Create gender vector


> gender_vector <- c("Male", "Female", "Female", "Male", "Male")
> class(gender_vector)
[1] "character"

# Convert gender_vector to a factor


> factor_gender_vector <-factor(gender_vector)
> class(factor_gender_vector)
[1] "factor"

It is important to transform a string into factor when we perform Machine Learning


task.

A categorical variable can be divided into nominal categorical variable and ordinal
categorical variable.

Nominal Categorical Variable


A categorical variable has several values but the order does not matter. For
instance, male or female categorical variable do not have ordering.

# Create a color vector


color_vector <- c('blue', 'red', 'green', 'white', 'black', 'yellow')

> color_vector
[1] "blue" "red" "green" "white" "black" "yellow"

# Convert the vector to factor


factor_color <- factor(color_vector)

factor_color
[1] blue red green white black yellow
Levels: black blue green red white yellow

Ordinal Categorical Variable


Ordinal categorical variables do have a natural ordering.
We can specify the order, from the lowest to the highest with order = TRUE and
highest to lowest with order = FALSE.

# Create Ordinal categorical vector


day_vector <- c('evening', 'morning', 'afternoon', 'midday', 'midnight', 'evening')

# Convert `day_vector` to a factor with ordered level


factor_day <- factor(day_vector, order = TRUE, levels =c('morning', 'midday',
'afternoon', 'evening', 'midnight'))

[1] evening morning afternoon midday midnight evening


Levels: morning < midday < afternoon < evening < midnight
# Print the new variable
factor_day

## Levels: morning < midday < afternoon < evening < midnight
# Append the line to above code
# Count the number of occurence of each level
summary(factor_day)

morning midday afternoon evening midnight


1 1 1 2 1

# Data Frame

a <- c(10,20,30,40)
b <- c("Suren","Ragav","Siva","Elan")
c <- c(TRUE,FALSE,TRUE,FALSE)
d <- c(2.5,8,10,7)
df <- data.frame(a,b,c,d)

df
a b c d
10 Suren TRUE 2.5
20 Ragav FALSE 8.0
30 Siva TRUE 10.0
40 Elan FALSE 7.0

# creating the header name to the data frame


names(df) <- c('ID','Names','Store','Value')

ID Names Store Value


10 Suren TRUE 2.5
20 Ragav FALSE 8.0
30 Siva TRUE 10.0
40 Elan FALSE 7.0

# Print the structure


str(df)
'data.frame': 4 obs. of 4 variables:
$ ID : num 10 20 30 40
$ Names: Factor w/ 4 levels "Elan","Ragav",..: 4 2 3 1
$ Store: logi TRUE FALSE TRUE FALSE
$ Value: num 2.5 8 10 7

## Select row 1 in column 2


df[1,2]
[1] Suren
Levels: Elan Ragav Siva Suren

## Select row 2 in column 2


df[2,2]
[1] Ragav
Levels: Elan Ragav Siva Suren

## Select Rows 1 to 3
df[1:3,]
ID Names Store Value
10 Suren TRUE 2.5
20 Ragav FALSE 8.0
30 Siva TRUE 10.0
## Select Columns 1
df[,1]
[1] 10 20 30 40

## Select Rows 1 to 3 and columns 3 to 4


df[1:3,3:4]

Store Value
TRUE 2.5
FALSE 8.0
TRUE 10.0

## Select Rows 1 to 3 and columns 2 to 4


df[1:3,2:4]

Names Store Value


Suren TRUE 2.5
Ragav FALSE 8.0
Siva TRUE 10.0

# Slice with columns name


df [, c('ID','Names')]
ID Names
10 Suren
20 Ragav
30 Siva
40 Elan

# Create a new vector

quantity <- c(10, 35, 40, 5)


df$quantity <- quantity
df
ID Names Store Value quantity
10 Suren TRUE 2.5 10
20 Ragav FALSE 8.0 35
30 Siva TRUE 10.0 40
40 Elan FALSE 7.0 5

# Add `quantity` to the `df` data frame


> quantity <- c(100,200,300)
> df$quantity <- quantity

Error in `$<-.data.frame`(`*tmp*`, quantity, value = c(100, 200, 300)) :


replacement has 3 rows, data has 4

> quantity <- c(100,200,300,500)


> df$quantity <- quantity
> df
ID Names Store Value quantity
1 10 Suren TRUE 2.5 100
2 20 Ragav FALSE 8.0 200
3 30 Siva TRUE 10.0 300
4 40 Elan FALSE 7.0 500

# select the columns


df$Store
[1] TRUE FALSE TRUE FALSE
df$Names
[1] Suren Ragav Siva Elan
Levels: Elan Ragav Siva Suren

# Select price above 100


subset(df, subset = quantity > 100)
ID Names Store Value quantity
20 Ragav FALSE 8 200
30 Siva TRUE 10 300
40 Elan FALSE 7 500

# Select price above and equal to 100


subset(df, subset = quantity >= 100)
ID Names Store Value quantity
10 Suren TRUE 2.5 100
20 Ragav FALSE 8.0 200
30 Siva TRUE 10.0 300
40 Elan FALSE 7.0 500

## Select Value above 5.0


subset(df, subset = Value > 5.0)
ID Names Store Value quantity
30 Siva TRUE 10 300

# Select Value above 10.0 and quantity above 100


(df <- df[with(df, quantity >= 100 & Value ==10.0), ])
ID Names Store Value quantity
30 Siva TRUE 10 300

# Select Value above 10.0 and quantity above 100


subset(df, subset = quantity >= 100 && Value ==10.0)
ID Names Store Value quantity
30 Siva TRUE 10 300

You might also like