Professional Documents
Culture Documents
• Users can select the data set in the dialog box or enter the
name of the data set (if they know).
• Data can also be entered directly using the editor of R
Commander via Data->New Data Set. However, this works well
when the data set is not too large.
Z <- (3, 4, 4)
b <- 4
b <- 3
b^3 + g (a)
a*b
The answer to the above code snippet is 35. The value of “a” passed
to the function is 2 and the value for “b” defined in the function f (a)
is 3. So the output would be 3^3 + g (2). The function g is defined in
the global environment and it takes the value of b as 4(due to lexical
scoping in R) not 3 returning a value 2*4= 8 to the function f. The
result will be 3^3+8= 35.
edit (MyTable)
The above code will open an Excel Spreadsheet for entering data
into MyTable.
Selection Sort
Quick Sort
Bubble Sort
Merge Sort
16) What is the best way to use Hadoop and R together for
analysis?
HDFS can be used for storing the data for long-term. MapReduce
jobs submitted from either Oozie, Pig or Hive can be used to encode,
improve and sample the data sets from HDFS into R. This helps to
leverage complex analysis tasks on the subset of data prepared in R.
if (is.na (a))
else if (a < 0)
else
invisible (a)
}
printmessage (NA)
print (filecontent)
41) How can you verify if a given object “X” is a matric data
object?
If the function call is.matrix(X ) returns TRUE then X can be termed
as a matrix data object.
43) How can you verify if a given object “X” is a matrix data
object?
If the function call is.matrix(X) returns true then X can be considered
as a matrix data object otheriwse not.
44) How will you measure the probability of a binary
response variable in R language?
Logistic regression can be used for this and the function glm () in R
language provides this functionality.
ii. The other is to use %in% which returns a Boolean value either
true or false.
55) How will you list all the data sets available in all R
packages?
Using the below line of code-
data(package = .packages(all.available = TRUE))
56) Which function is used to create a histogram
visualisation in R programming language?
Hist()
57) Write the syntax to set the path for current working
directory in R environment.
Setwd(“dir_path”)
58) How will you drop variables using indices in a data
frame?
Let’s take a dataframe df<-
data.frame(v1=c(1:5),v2=c(2:6),v3=c(3:7),v4=c(4:8))
df
## v1 v2 v3 v4
## 1 1 2 3 4
## 2 2 3 4 5
## 3 3 4 5 6
## 4 4 5 6 7
## 5 5 6 7 8
df1<-df[-c(2,3)]
df1
## v1 v4
## 1 1 4
## 2 2 5
## 3 3 6
## 4 4 7
## 5 5 8
63) Write a function to extract the first name from the string
“Mr. Tom White”.
substr (“Mr. Tom White”,start=5, stop=7)
64) Can you tell if the equation given below is linear or not ?
Emp_sal= 2000+2.5(emp_age)2
Yes it is a linear equation as the coefficients are linear.
## 4: else
## ^
strsplit("contact@dezyre.com",split = ".")
Output of the strsplit function is -
## [[1]]
all – The default value for this is set to FALSE which means that
only matching rows are returned resulting in Inner join. This should
be set to true if you want all the observations from dataframe X and
Y resulting in Outer join.
tt <- sort(table(c("a", "b", "a", "a", "b", "c", "a1", "a1", "a1")), dec=T)
depth <- 3
tt[1:depth]
Output -
1) a a1 b
2) 3 3 2
gender=factor(c(“M”,”F”,”M”,”F”,”F”,”F”))
table(sex)
F M
4 2
F 4 66.67
M 2 33.33
Example –
gender = factor(c("f","m","m","f","m","f"))
y = table(gender)
cumsum(y)
fm
33
1. What is R?
R is a programming language which is used for developing statistical
software and data analysis.
2. How R commands are written?
By using # at the starting of the line of code like #division commands are
written.
3.What is t-tests() in R?
It is used to determine that the means of two groups are equal or not by using
t.test() function.
4.What are the disadvantages of R Programming?
The disadvantages are:-
• Lack of standard GUI
• Not good for big data.
• Does not provide spreadsheet view of data.
5.What is the use of With () and By () function in R?
with() function applies an expression to a dataset.
#with(data,expression)
By() function applies a function t each level of a factors.
#by(data,factorlist,function)
6. In R programming, how missing values are represented?
In R missing values are represented by NA which should be in capital letters.
7.What is the use of subset() and sample() function in R?
Subset() is used to select the variables and observations and sample() function
is used to generate a random sample of the size n from a dataset.
8. Explain what is transpose?
Transpose is used for reshaping of the data which is used for analysis. Transpose
is performed by t() function.
9.What are the advantages of R?
The advantages are:-
• It is used for managing and manipulating of data.
• No license restrictions
• Free and open source software.
• Graphical capabilities of R are good.
• Runs on many Operating system and different hardware and also run on 32 &
64 bit processors etc.
10. What is the function used for adding datasets in R?
For adding two datasets rbind() function is used but the column of two datasets
must be same.
Syntax: rbind(x1,x2……) where x1,x2: vector, matrix, data frames.
11.How you can produce co-relations and covariances?
Cor-relations is produced by cor() and covariances is produced by cov() function.
12.What is difference between matrix and dataframes?
Dataframe can contain different type of data but matrix can contain only similar
type of data.
13.What is difference between lapply and sapply?
lapply is used to show the output in the form of list whereas sapply is used to
show the output in the form of vector or data frame.
14. What is the difference between seq(4) and seq_along(4)?
Seq(4) means vector from 1 to 4 (c(1,2,3,4)) whereas seq_along(4) means a
vector of the length(4) or 1(c(1)).
15. Explain how you can start the R commander GUI?
rcmdr command is used to start the R commander GUI.
16. What is the memory limit of R?
In 32 bit system memory limit is 3Gb but most versions limited to 2Gb and in 64
bit system memory limit is 8Tb.
17.How many data structures R has?
There are 5 data structure in R i.e. vector, matrix, array which are of
homogenous type and other two are list and data frame which are
heterogeneous.
18. Explain how data is aggregated in R?
There are two methods that is collapsing data by using one or more BY variable
and other is aggregate() function in which BY variable should be in list.
19. How many sorting algorithms are available?
there are 5 types of sorting algorithms are used which are:-
• Bubble Sort
• Selection Sort
• Merge Sort
• Quick Sort
• Bucket Sort
20.How to create new variable in R programming?
For creating new variable assignment operator ‘<-’ is used
For e.g. mydata$sum <- mydata$x1 + mydata$x2
21.What are R packages?
Packages are the collections of data, R functions and compiled code in a well-
defined format and these packages are stored in library.
22.What is the workspace in R?
Workspace is the current R working environment which includes any user defined
objects like vector, lists etc.
23.What is the function which is used for merging of data frames horizontally
in R?
Merge()function is used to merge two data frames
Eg. Sum<-merge(data frame1,data frame 2,by=’ID’).
24.what is the function which is used for merging of data frames vertically in
R?
rbind() function is used to merge two data frames vertically.
Eg. Sum<- rbind(data frame1,data frame 2)
25.What is the power analysis?
It is used for experimental design .It is used to determine the effect of given
sample size.
26.Which package is used for power analysis in R?
Pwr package is used for power analysis in R.
27.Which method is used for exporting the data in R?
There are many ways to export the data into another formats like SPSS, SAS ,
Stata , Excel Spreadsheet.
28.Which packages are used for exporting of data?
For excel xlsReadWrite package is used and for sas,spss ,stata foreign package is
implemented.
29. How impossible values are represented in R?
In R NaN is used to represent impossible values.
30.Which command is used for storing R object into a file?
Save command is used for storing R objects into a file.
Syntax: >save(z,file=”z.Rdata”)
31. Which command is used for restoring R object from a file?
load command is used for storing R objects from a file.
Syntax: >load(”z.Rdata”)
32.What is the use of coin package in R?
coin package is used to achieve the re randomization or permutation based
statistical tests.
33.Which function is used for sorting in R?
order() function is used to perform the sorting.
34.What is the use of tapply?
IOS-6.1.3
35.What happens when the application object does not handle an event?
the event will be dispatched to your delegate for processing.
36.Explain app specific objects which store the app contents?
Data model objects are app specific objects and store app’s content. Apps can
also use document objects.
37.Explain the purpose of using UIWindow object?
UIWindow object coordinates the one or more views presenting on the screen.
38.Tell me the super class of all view controller objects?
UIView Controller class.
39.How to create axes in the graph?
Using axes() function custom axes are created.
40.What is the use of abline() function?
abline() function is add the reference line to a graph.
Syntax:- abline(h=yvalues, v=xvalues)
41.Why vcd package is used?
vcd package provides different methods for visualizing multivariate categorical
data.
42. What is GGobi?
GGobi is an open source program for visualization for exploring high dimensional
typed data.
43.What is iPlots?
It is a package which provide bar plots, mosaic plots, box plots, parallel plots,
scatter plots and histograms.
44.What is the use of lattice package?
lattice package is to improve on base R graphics by giving better defaults and it
have the ability to easily display multivariate relationships.
45. What is fitdistr() function?
It is used to provide the maximum likelihood fitting of univariate distributions. It
is defined under the MASS package.
46.Which data structures are used to perform statistical analysis and create
graphs.
Data structures are vectors, arrays, data frames and matrices.
47.What is the use of sink() function?
It defines the direction of output.
48. Why library() function is used?
This function is used to show the packages which are installed.
49.Why search() function is used?
By this function we see that which packages are currently loaded.
50. On which type of data binary operators are worked?
Binary operators are worked on matrices, vectors and scalars.
51. What is the use of doBY package?
It is used to define the desired table using function and model formula.
52. Which function is used to create frequency table?
Frequency table is created by table() function.
53.Define loglm() function.
Loglm() function is used to create log-linear models.
54.What is the use of corrgram() function?
corrgram() function is used to plot correlograms.
55.How to create scatterplot matrices?
Pair() or splom() function is used for create scatterplot matrices.
56. What is npmc?
It is a package which gives nonparametric multiple comparisons.
57. What is the use of diagnostic plots?
It is used to check the normality, heteroscedasticity and influential observations.
58.Define anova() function.
anova() is used to compare the nested models.
59.What is cv.lm() function?
It is defined under the DAAG package which is used for k-fold validation.
60. Define stepAIC() function.
It is define under the MASS package which performs stepwise model selection
under exact AIC.
61. Define leaps().
It is used to perform the all-subsets regression and it is defined under the leaps
package.
62.Define relaimpo package.
It is used to measure the relative importance of each of the predictor in the
model.
63.Why car package is used?
It provide a variety of regression including scatter plots, variable plots and it
also enhanced diagnostic.
64. Define robust package.
It provides a library of robust methods including regression.
65. What is robustbase?
It is a package which provides basic robust statistics including model selection
methods.
66. Define plotmeans().
It is define under gplots package which includes confidence intervals and it
produces mean plot for single factors.
67.What is the full form of MANOVA?
MANOVA stands for multivariate analysis of variance.
68. What is the use of MANOVA?
By using MANOVA we can test more than one dependent variable simultaneously.
69. Define mshapiro.test( ).
It is a function which defines in mvnormtest package. It produces the Shapiro-
wilk test for multivariate normality.
70. Define barlett.test().
Barlett.test() is used to provide a parametric k-sample test of the equality of
variances.
71.What is fligner.test()?
It is a function which provides a non-parametric k sample test of the equality of
variances.
72.Define hovplot().
It is define in HH package which provides a graphic test of homogeneity of
variance based on brown forsyth.
73.Which variables are represented by lower case letters?
Numerical variables are represented by lower case letters.
74. Which variables are represented by upper case letters?
Categorical factors are represented by upper case letters.
75.What is logistic regression?
Logistic regression is used to predict the binary outcome from the given set of
continuous predictor variables.
76.Define Poison regression.
It is used to predict the outcome variable which represents counts from the given
set of continuous predictor variable.
77.Define Survival analysis.
It includes number of techniques which is used for modeling the time to an event.
78. What is the use survfit() function?
It estimates a survival distribution one or more groups.
79. Define survdiff().
It determines the differences in survival distribution between two or more groups.
80.What is coxph()?
It is a function which is used to model the hazard function on the set of predictor
variable.
81. In which package survival analysis is defined?
Survival analysis is defined under the survival package.
82.What is the use of MASS package?
MASS functions include those functions which performs linear and quadratic
discriminant function analysis.
83. Define qda().
qda() prints a quadratic discriminant function.
84.Define lda().
lda() is used to print the discriminant functions which is based on centered
variable.
85. What is the use of forecast package?
It provides the functions which are used for automatic selection of ARIMA and
exponential models.
86.Define auto.arima().
It is used to handle the seasonal as well as non-seasonal ARIMA models.
87.What is principal() function?
It is define in psych package which is used to rotate and extract the principal
componants.
88.What is FactoMineR?
It is a package which includes quantitative and qualitative variables. It also
includes supplementary variables and observations.
89.What is the full form of CFA?
CFA stands for Confirmatory Factor Analysis.
90.What is the use of boot.sem() function?
It is used to bootstrap the structural equation model.
91.What is the full form of SEM?
SEM stands for Structural Equation Modeling.
92. Which function performs classical multidimensional scaling?
cmdscale() function is used to perform classical multidimensional scaling.
93.Define isoMDS().
This function is defined under the MASS package which performs nonmetric
multidimensional scaling.
94.Which function perform individual difference scaling?
It is done by indscal() function.
95. What is pvclust() function ?
It comes under the pvclust package which provides p-values for hierarchical
clustering .
96.Define cluster.stats() ?
It is define in fpc package which provide a method for comparing the similarity of
two clusters solution using different validation criteria.
97.What we use party package?
It is used to provide a non-parametric regression for ordinal, nominal, censored
and multivariate responses.
98. Which package provide the bootstrapping?
boot package is used which provide bootstrapping.
99.Define matlab package.
Matlab package includes those wrapper functions and variable which are used to
replicate matlab function calls.
100.What is the of use Matrix package?
Matrix package includes those function which support sparse and dense
matrices like Lapack, BLAS etc.
R Programming: 35 Job Interview Questions and Answers
Posted by Laetitia Van Cauwenberge on December 6, 2015 at 9:00am
View Blog
Read the questions. At the bottom, you will find a link to the answers.
The Questions
First Set
1. Explain what is R?
2. List out some of the function that R provides?
3. Explain how you can start the R commander GUI?
4. In R how you can import Data?
5. Mention what does not ‘R’ language do?
6. Explain how R commands are written?
7. How can you save your data in R?
8. Mention how you can produce co-relations and covariances?
9. Explain what is t-tests in R?
10. Explain what is With () and By () function in R is used for?
11. What are the data structures in R that is used to perform statistical analyses and
create graphs?
12. Explain general format of Matrices in R?
13. In R how missing values are represented ?
14. Explain what is transpose?
15. Explain how data is aggregated in R?
16. What is the function used for adding datasets in R?
17. What is the use of subset() function and sample() function in R ?
18. Explain how you can create a table in R without external file?
You can find the answers here.
Second Set
1. Data structure -- How many data structures R has? How do you build a binary
search tree in R?
2. Sorting -- How many sorting algorithms are available? Show me an example in R.
3. Low level -- How do you build a R function powered by C?
4. String -- How do you implement string operation in R?
5. Vectorization -- If you want to do Monte Carlo simulation by R, how do you improve
the efficiency?
6. Function -- How do you take function as argument of another function? What is the
apply() function family?
7. Threading -- How do you do multi-threading in R?
8. Memory limit and database -- What is the memory limit of R? How do you avoid it?
How do you use SQL in R?
9. Testing -- How do you do testing and debugging in R?
10. Software development -- How do you develop a package? How do you do version
control?
You can find the answers here.
Third Set
1. If I have a data.frame df <- data.frame(a = c(1, 2, 3), b = c(4, 5, 6), c(7, 8,
9))...
How do I select the c(4, 5, 6)?
How do I select the 1?
How do I select the 5?
What is df[, 3]?
What is df[1,]?
What is df[2, 2]?
2. What is the difference between a matrix and a dataframe?
3. If I concatenate a number and a character together, what will the class of the resulting
vector be?
4. What if I concatenate a number and a logical?
5. What if I concatenate a number and NA?
6. What is the difference between sapply and lapply? When should you use one versus
the other? Bonus: When should you use vapply?
7. What is the difference between seq(4) and seq_along(4)?
8. What is f(3) where:
y <- 5 f <- function(x) { y <- 2; y^2 + g(x) } g <- function(x) { x + y }
Why?
9. I want to know all the values in c(1, 4, 5, 9, 10) that are not in c(1, 5, 10, 11, 13).
How do I do that with one built-in function in R? How could I do it if that function didn't exist?
10. Can you write me a function in R that replaces all missing values of a vector with the
mean of that vector?
11. How do you test R code? Can you write a test for the function you wrote in #6?
12. Say I have...
fn(a, b, c, d, e) a + b * c - d / e
How do I call fn on the vector c(1, 2, 3, 4, 5) so that I get the same result as fn(1, 2,
3, 4, 5)? (No need to tell me the result, just how to do it.)
13. dplyr <- "ggplot2" library(dplyr)
Why does the dplyr package get loaded and not ggplot2?
14. mystery_method <- function(x) { function(z) Reduce(function(y, w) w(y), x, z) } fn <-
mystery_method(c(function(x) x + 1, function(x) x * x)) fn(3)
What is the value of fn(3)? Can you explain what is happening at each step?
1.) If I have a data.frame df <- data.frame(a = c(1, 2, 3), b = c(4, 5, 6), c(7, 8, 9))...
1a.) How do I select the c(4, 5, 6)?
1b.) How do I select the 1?
1c.) How do I select the 5?
1d.) What is df[, 3]?
1e.) What is df[1,]?
1f.) What is df[2, 2]?
Answers: (a) df[[2]] or df$b, (b) df[[1]][[1]] or df$a[[1]], (c) df[[2]][[2]] or df$b[[2]],
(d) 7 8 9, (e) 1 4 7, (f) 5.
2.) What is the difference between a matrix and a dataframe?
3a.) If I concatenate a number and a character together, what will the class
of the resulting vector be?
4.) What is the difference between sapply and lapply? When should you use
one versus the other? Bonus: When should you use vapply?
Answer: Use lapply when you want the output to be a list, and sapply when you
want the output to be a vector or a dataframe. Generally vapply is preferred
over sapply because you can specify the output type of vapply (but not sapply).
The drawback is vapply is more verbose and harder to use.
5.) What is the difference between seq(4) and seq_along(4)?
Answer: seq(4) produces a vector from 1 to 4 (c(1, 2, 3, 4)),
whereas seq_along(4) produces a vector of length(4), or 1 (c(1)).
6.) What is f(3) where:
y <- 5
f <- function(x) { y <- 2; y^2 + g(x) }
g <- function(x) { x + y }
Why?
Answer:
9.) How do you test R code? Can you write a test for the function you wrote
in #6?
Answer: You can use Hadley's testthat package. A test might look like this:
fn(a, b, c, d, e) a + b * c - d / e
How do I call fn on the vector c(1, 2, 3, 4, 5) so that I get the same result as fn(1,
2, 3, 4, 5)? (No need to tell me the result, just how to do it.)
Answer: do.call(fn, as.list(c(1, 2, 3, 4, 5)))
11.)
Why does the dplyr package get loaded and not ggplot2?
Answer: deparse(substitute(dplyr))
12.)
What is the value of fn(3)? Can you explain what is happening at each step?
Answer:
Object Class
Example 2 :
xx <- data.frame(var1=c(1:5))
class(xx)
It returns "data.frame".
str(xx) returns 'data.frame' : 5 obs. of 1 variable: $ var1: int
2. What is the use of mode() function?
It returns the storage mode of an object.
x <- factor(1:5)
mode(x)
The above mode function returns numeric.
Mode Function
x <- data.frame(var1=c(1:5))
mode(x)
It returns list.
gender = factor(c("m","f","f","m","f","f"))
table(gender)
Output
If you want to include % of values in each group, you can store the result
in data frame using data.frame function and the calculate the column
percent.
t = data.frame(table(gender))
t$percent= round(t$Freq / sum(t$Freq)*100,2)
Frequency Distribution
gender = factor(c("m","f","f","m","f","f"))
x = table(gender)
cumsum(x)
Cumulative Sum
If you want to see the cumulative percentage of values, see the code
below :
t = data.frame(table(gender))
t$cumfreq = cumsum(t$Freq)
t$cumpercent= round(t$cumfreq / sum(t$Freq)*100,2)
x <- c(4,5,6)
y <- c(2,3)
Multiplication of vectors
z=cbind(x,y)
cbind : Output
rbind : Output
While using cbind() function, make sure the number of rows must be
equal in both the datasets. While using rbind() function, make sure both
the number and names of columnsmust be same. If names of columns
would not be same, wrong data would be appended to columns or records
might go missing.
library(dplyr)
combdf = bind_rows(df,df2)
library(dplyr)
df %>% mutate((x+y) + (x-y))
by() function in R
The by() function is equivalent to group by function in SQL. It is used to
perform calculation by a factor or a categorical variable. In the example
below, we are computing mean of variable var2 by a factor var1.
df = data.frame(var1=factor(c(1,2,1,2,1,2)), var2=c(10:15))
with(df, by(df, var1, function(x) mean(x$var2)))
The group_by() function in dply package can perform the same task.
library(dplyr)
df %>% group_by(var1)%>% summarise(mean(var2))
df = data.frame(var1=c(1:5))
colnames(df)[colnames(df) == 'var1'] <- 'variable1'
The rename() function in dplyr package can also be used to rename a
variable.
library(dplyr)
df= rename(df, variable1=var1)
16. What is the use of which() function in R?
The which() function returns the position of elements of a logical vector
that are TRUE. In the example below, we are figuring out the row number
wherein the maximum value of a variable x is recorded.
mydata=data.frame(x = c(1,3,10,5,7))
which(mydata$x==max(mydata$x))
It returns 3 as 10 is the maximum value and it is at 3rd row in the variable x.
library(dplyr)
data %>% mutate(var=coalesce(X,Y,Z))
COALESCE Function in R
Output
mult(2)
[1] 4
Answer : The value of 'x' will remain 3. See the output shown in the image
below-
Output
x=3
mult <- function(j)
{
x <<- j * 2
return(x)
}
mult(2)
x
The operator "<<-" tells R to search in the parent environment for an
existing definition of the variable we want to be assigned.
If you want to change the default single space separator, you can add
sep="," keyword to include comma as a separator.
x = "AXZ2016"
substr(x,1,3)
dt2 = read.table(text="
var
Sandy,Jones
Dave,Jon,Jhonson
", header=TRUE)
The word() function of stringr package is used to extract or scan word from a
string. -1 in the second parameter denotes the last word.
library(stringr)
dt2$var2 = word(dt2$var, -1, sep = ",")
df1=data.frame(ID=c(1:5), Score=runif(5,50,100))
df2=data.frame(ID=c(3,5,7:9), Score2=runif(5,1,100))
comb = merge(df1, df2, by ="ID", all.x = TRUE)
Left Join (SQL Style)
library(sqldf)
comb = sqldf('select df1.*, df2.* from df1 left join df2 on df1.ID = df2.ID')
df1=data.frame(ID=c(1:5), Score=c(50:54))
df2=data.frame(ID=c(3,5,7:9), Score=c(52,60:63))
library(dplyr)
comb = intersect(df1,df2)
library(sqldf)
comb2 = sqldf('select * from df1 intersect select * from df2 ')
R Base Method
library(tictoc)
tic()
runif(5555,1,1000)
toc()
32. Which package is generally used for fast
data manipulation on large datasets?
The package data.table performs fast data manipulation on large datasets.
See the comparison between dplyr and data.table.
# Load data
library(nycflights13)
data(flights)
df = setDT(flights)
Result : data.table package took 0.04 seconds. whereas dplyr package took
0.07 seconds. So, data.table is approx. 40% faster than dplyr. Since the
dataset used in the example is of medium size, there is no noticeable
difference between the two. As size of data grows, the difference of
execution time gets bigger.
In the first case, it created two vectors v1 and v2 and a data frame temp
which has 2 variables with improper variable names. The second code
creates a data frame temp with proper variable names.
1. Bubble Sort
2. Selection Sort
3. Merge Sort
4. Quick Sort
5. Bucket Sort
R Base Method
df = subset(mydata, select = -c(x,y,z))
With dplyr package
library(dplyr)
df = select(mydata, -c(x,y,z))
To create a new data without any missing value, you can use the code
below :
df <- na.omit(mydata)
library(lubridate)
interval(dates[1], dates[2]) %/% hours(1)
interval(dates[1], dates[2]) %/% days(1)
interval(dates[1], dates[2]) %/% weeks(1)
interval(dates[1], dates[2]) %/% months(1)
interval(dates[1], dates[2]) %/% years(1)
The number of months unit is not included in the base difftime() function so
we can use interval() function of lubridate() package.
Example :
set.seed(1234)
x = sample(1:50, 10)
x
[1] 6 31 30 48 40 29 1 10 28 22
sort(x)
[1] 1 6 10 22 28 29 30 31 40 48
rank(x)
[1] 2 8 7 10 9 6 1 3 5 4
2 implies the number in the first position is the second lowest and 8 implies
the number in the second position is the eighth lowest.
order(x)
[1] 7 1 8 10 9 6 3 2 5 4
7 implies the 7th value of x is the smallest value, so 7 is the first element of
order(x) and i refers to the first value of x is the second smallest.
If you run x[order(x)], it would give you the same result as sort() function.
The difference between these two functions lies in two or more dimensions
of data (two or more columns). In other words, the sort() function cannot be
used for more than 1 dimension whereas x[order(x)] can be used.