Basic

link
https://www.guru99.com/r-data-frames.html
# First way to declare a variable: use the `<-`

name_of_variable <- value
# Second way to declare a variable: use the `=`

name_of_variable = value
# Numerical
num <- c(1,2,222,122)
num
R VECTOR
R vector. ... Vector is a basic data structure in R. It contains element of the

same type.
The data types can be logical, integer, double, character, complex or raw.
A vector's type can be checked with the typeof() function.
Another important property of a vector is its length.
What is the difference between list and vector in R?

Vectors (one-dimensional array): can hold numeric, character or logical values.
The elements in a vector all have the same data type. ... To summarize,
the main difference between list and vector is that list in R allows you to gather
a variety of objects under one name (that is, the name of the list) in an ordered
way.
FACTOR VS VECTOR
Factors in R are stored as a vector of integer values with a corresponding set of

character values to use when the factor is displayed.
The factor function is used to create a factor. ...
Both numeric and character variables can be made into factors, but a factor's
levels will always be character values.
HERE c is combining the values in list or vectors etc.,

without giving c we can't give the words in list.
i can't add multiple numbers and character in list without c (combine) function.
ERROR without giving the c means combine.
> quantity <- (100,200,300,500)

Error: unexpected ',' in "quantity <- (100,"
# Character
char <- c ("S","U","R","E","N")
char
# BOOlean
boo <- c("True","False","True")
boo
x <- (28.0)
class(x)
y <- (12)
x-y
# Create the vectors
vec_num <- c(1,5,5,12)
vec_num1 <- c(2,2,5,3)
# Take the sum of A_vector and B_vector

sum_res <- vec_num + vec_num1
sum_res
# Slice the first five rows of the vector

slice_num <- c(1,2,3,4,5,6,7,8,9,10)
slice_num[1:5]
slice_num[5]
# Faster way to create adjacent value

c(1:8)
# Arithmetic Operators
3+4
100-99
3*5
10/2
10%%20
2^10
# Logical operator
Logical_vec <- c(1:15)
Logical_vec>6
# Print value strictly above 5

Logical_vec[(Logical_vec>6)]
# Print value 5 & 6 alone

Logical_vec[(Logical_vec>4) & (Logical_vec<7)]
What is a Matrix?
A matrix is a 2-dimensional array that has n number of rows and n number of
columns.
In other words, matrix is a combination of two or more vectors with the same data
type.
Note: It is possible to create more than two dimensions arrays with R.
Arguments:
data: The collection of elements that R will arrange into the rows and columns of
the matrix \
nrow: Number of rows
ncol: Number of columns
byrow: The rows are filled from the left to the right. We use `byrow = FALSE`
(default values),
if we want the matrix to be filled by the columns i.e. the values are filled
top to bottom.
# Construct a matrix with 5 rows that contain the numbers 1 up to 10 and byrow =
TRUE
> matrix_a <-matrix(1:10, byrow=TRUE, nrow=5,ncol=5)

> matrix_a
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 1 2 3 4 5
[4,] 6 7 8 9 10
[5,] 1 2 3 4 5
> matrix_a <-matrix(1:30, byrow=TRUE, nrow=5,ncol=5)

> matrix_a
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
[4,] 16 17 18 19 20
[5,] 21 22 23 24 25
# Construct a matrix with 5 rows that contain the numbers 1 up to 10 and byrow =
FALSE
> matrix_a <-matrix(1:30, byrow=FALSE, nrow=5,ncol=5) // byrow = FALSE

> matrix_a
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25
# Print dimension of the matrix with dim()

> dim(matrix_a) // show the no.of rows
and columns
[1] 5 5
# concatenate c(1:5) to the matrix_a

> matrix_a1 <- cbind(matrix_a, c(1:5))
> matrix_a1
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 6 11 16 21 1
[2,] 2 7 12 17 22 2
[3,] 3 8 13 18 23 3
[4,] 4 9 14 19 24 4
[5,] 5 10 15 20 25 5
> dim(matrix_a1)
[1] 5 6
Warning message:
In matrix(1:60, byrow = FALSE, ncol = 8, nrow = 8) :
data length [60] is not a sub-multiple or multiple of the number of rows [8]
> matric_1 <- matrix(1:60, byrow=FALSE, ncol=10, nrow = 10)

> matric_1
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 11 21 31 41 51 1 11 21 31
[2,] 2 12 22 32 42 52 2 12 22 32
[3,] 3 13 23 33 43 53 3 13 23 33
[4,] 4 14 24 34 44 54 4 14 24 34
[5,] 5 15 25 35 45 55 5 15 25 35
[6,] 6 16 26 36 46 56 6 16 26 36
[7,] 7 17 27 37 47 57 7 17 27 37
[8,] 8 18 28 38 48 58 8 18 28 38
[9,] 9 19 29 39 49 59 9 19 29 39
[10,] 10 20 30 40 50 60 10 20 30 40

> matric_1
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 7 13 19 25 31 37 43 49 55
[2,] 2 8 14 20 26 32 38 44 50 56
[3,] 3 9 15 21 27 33 39 45 51 57
[4,] 4 10 16 22 28 34 40 46 52 58
[5,] 5 11 17 23 29 35 41 47 53 59
[6,] 6 12 18 24 30 36 42 48 54 60
========================================================ERROR======================
===========================================================
> matric_1 <- matrix(1:30, byrow=FALSE, ncol=5)
> matric_d <- cbind(matrix_a1, matric_1)
Error in cbind(matrix_a1, matric_1) :
number of rows of matrices must match (see arg 2)
SOLVED
While combining the two vector into one vector the no.of rows and columns should be
same.
See the above error.

> matric_d <- cbind(matrix_a1, matric_1)
> matric_d
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 1 6 11 16 21 1 1 6 11 16 21
[2,] 2 7 12 17 22 2 2 7 12 17 22
[3,] 3 8 13 18 23 3 3 8 13 18 23
[4,] 4 9 14 19 24 4 4 9 14 19 24
[5,] 5 10 15 20 25 5 5 10 15 20 25
> dim(matric_d)
[1] 5 11
FACTOR
Types
Categoraical
Non-Categorical
Contiuous Variables
R - Factors. Factors are the data objects which are used to categorize the data and
store it as levels.
They can store both strings and integers.
They are useful in the columns which have a limited number of unique values.
Factors in R. ... Factors in R are stored as a vector of integer values with a

corresponding set of character values to use when the factor is displayed.
The factor function is used to create a factor.
The only required argument to factor is a vector of values which will be returned
as a vector of factor values.
# Create gender vector

> gender_vector <- c("Male", "Female", "Female", "Male", "Male")
> class(gender_vector)
[1] "character"
# Convert gender_vector to a factor

> factor_gender_vector <-factor(gender_vector)
> class(factor_gender_vector)
[1] "factor"
It is important to transform a string into factor when we perform Machine Learning

task.
A categorical variable can be divided into nominal categorical variable and ordinal
categorical variable.
Nominal Categorical Variable

A categorical variable has several values but the order does not matter. For
instance, male or female categorical variable do not have ordering.
# Create a color vector

color_vector <- c('blue', 'red', 'green', 'white', 'black', 'yellow')
> color_vector
[1] "blue" "red" "green" "white" "black" "yellow"
# Convert the vector to factor

factor_color <- factor(color_vector)
factor_color
[1] blue red green white black yellow
Levels: black blue green red white yellow
Ordinal Categorical Variable

Ordinal categorical variables do have a natural ordering.
We can specify the order, from the lowest to the highest with order = TRUE and
highest to lowest with order = FALSE.
# Create Ordinal categorical vector

day_vector <- c('evening', 'morning', 'afternoon', 'midday', 'midnight', 'evening')
# Convert `day_vector` to a factor with ordered level

factor_day <- factor(day_vector, order = TRUE, levels =c('morning', 'midday',
'afternoon', 'evening', 'midnight'))
[1] evening morning afternoon midday midnight evening

Levels: morning < midday < afternoon < evening < midnight
# Print the new variable
factor_day
## Levels: morning < midday < afternoon < evening < midnight
# Append the line to above code
# Count the number of occurence of each level
summary(factor_day)
morning midday afternoon evening midnight

1 1 1 2 1
# Data Frame
a <- c(10,20,30,40)
b <- c("Suren","Ragav","Siva","Elan")
c <- c(TRUE,FALSE,TRUE,FALSE)
d <- c(2.5,8,10,7)
df <- data.frame(a,b,c,d)
df
a b c d
10 Suren TRUE 2.5
20 Ragav FALSE 8.0
30 Siva TRUE 10.0
40 Elan FALSE 7.0
# creating the header name to the data frame

names(df) <- c('ID','Names','Store','Value')
ID Names Store Value

10 Suren TRUE 2.5
20 Ragav FALSE 8.0
30 Siva TRUE 10.0
40 Elan FALSE 7.0
# Print the structure

str(df)
'data.frame': 4 obs. of 4 variables:
$ ID : num 10 20 30 40
$ Names: Factor w/ 4 levels "Elan","Ragav",..: 4 2 3 1
$ Store: logi TRUE FALSE TRUE FALSE
$ Value: num 2.5 8 10 7
## Select row 1 in column 2

df[1,2]
[1] Suren
Levels: Elan Ragav Siva Suren
## Select row 2 in column 2

df[2,2]
[1] Ragav
## Select Rows 1 to 3
df[1:3,]
ID Names Store Value
10 Suren TRUE 2.5
20 Ragav FALSE 8.0
30 Siva TRUE 10.0
## Select Columns 1
df[,1]
[1] 10 20 30 40
## Select Rows 1 to 3 and columns 3 to 4

df[1:3,3:4]
Store Value
TRUE 2.5
FALSE 8.0
TRUE 10.0
## Select Rows 1 to 3 and columns 2 to 4

df[1:3,2:4]
Names Store Value

Suren TRUE 2.5
Ragav FALSE 8.0
Siva TRUE 10.0
# Slice with columns name

df [, c('ID','Names')]
ID Names
10 Suren
20 Ragav
30 Siva
40 Elan
# Create a new vector
quantity <- c(10, 35, 40, 5)

df$quantity <- quantity
df
ID Names Store Value quantity
10 Suren TRUE 2.5 10
20 Ragav FALSE 8.0 35
30 Siva TRUE 10.0 40
40 Elan FALSE 7.0 5
# Add `quantity` to the `df` data frame

> quantity <- c(100,200,300)
> df$quantity <- quantity
Error in `$<-.data.frame`(`*tmp*`, quantity, value = c(100, 200, 300)) :

replacement has 3 rows, data has 4
> quantity <- c(100,200,300,500)

> df$quantity <- quantity
> df
1 10 Suren TRUE 2.5 100
2 20 Ragav FALSE 8.0 200
3 30 Siva TRUE 10.0 300
4 40 Elan FALSE 7.0 500
# select the columns

df$Store
[1] TRUE FALSE TRUE FALSE
df$Names
[1] Suren Ragav Siva Elan
# Select price above 100

subset(df, subset = quantity > 100)
20 Ragav FALSE 8 200
30 Siva TRUE 10 300
40 Elan FALSE 7 500
# Select price above and equal to 100

subset(df, subset = quantity >= 100)
10 Suren TRUE 2.5 100
20 Ragav FALSE 8.0 200
30 Siva TRUE 10.0 300
40 Elan FALSE 7.0 500
## Select Value above 5.0

subset(df, subset = Value > 5.0)
30 Siva TRUE 10 300
# Select Value above 10.0 and quantity above 100

(df <- df[with(df, quantity >= 100 & Value ==10.0), ])
30 Siva TRUE 10 300
# Select Value above 10.0 and quantity above 100

subset(df, subset = quantity >= 100 && Value ==10.0)
30 Siva TRUE 10 300

Basic

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic

Uploaded by

Copyright:

Available Formats

link

# First way to declare a variable: use the `<-`

# Second way to declare a variable: use the `=`

R vector. ... Vector is a basic data structure in R. It contains element of the

What is the difference between list and vector in R?

Factors in R are stored as a vector of integer values with a corresponding set of

HERE c is combining the values in list or vectors etc.,

ERROR without giving the c means combine.

> quantity <- (100,200,300,500)

# Take the sum of A_vector and B_vector

# Slice the first five rows of the vector

# Faster way to create adjacent value

# Print value strictly above 5

# Print value 5 & 6 alone

Note: It is possible to create more than two dimensions arrays with R.

nrow: Number of rows

ncol: Number of columns

> matrix_a <-matrix(1:10, byrow=TRUE, nrow=5,ncol=5)

> matrix_a <-matrix(1:30, byrow=TRUE, nrow=5,ncol=5)

> matrix_a <-matrix(1:30, byrow=FALSE, nrow=5,ncol=5) // byrow = FALSE

# Print dimension of the matrix with dim()

# concatenate c(1:5) to the matrix_a

> matric_1 <- matrix(1:60, byrow=FALSE, ncol=10, nrow = 10)

> matric_1 <- matrix(1:60, byrow=FALSE, ncol=10, nrow = 6)

> matric_1 <- matrix(1:30, byrow=FALSE, ncol=5, nrow = 5)

Factors in R. ... Factors in R are stored as a vector of integer values with a

# Create gender vector

# Convert gender_vector to a factor

It is important to transform a string into factor when we perform Machine Learning

Nominal Categorical Variable

# Create a color vector

# Convert the vector to factor

Ordinal Categorical Variable

# Create Ordinal categorical vector

# Convert `day_vector` to a factor with ordered level

[1] evening morning afternoon midday midnight evening

morning midday afternoon evening midnight

# creating the header name to the data frame

ID Names Store Value

# Print the structure

## Select row 1 in column 2

## Select row 2 in column 2

## Select Rows 1 to 3 and columns 3 to 4

## Select Rows 1 to 3 and columns 2 to 4

Names Store Value

# Slice with columns name

# Create a new vector

quantity <- c(10, 35, 40, 5)

# Add `quantity` to the `df` data frame

Error in `$<-.data.frame`(`*tmp*`, quantity, value = c(100, 200, 300)) :

> quantity <- c(100,200,300,500)

# select the columns

# Select price above 100

# Select price above and equal to 100

## Select Value above 5.0

# Select Value above 10.0 and quantity above 100

# Select Value above 10.0 and quantity above 100

You might also like

Error in `$<-.data.frame`(`tmp`, quantity, value = c(100, 200, 300)) :