Professional Documents
Culture Documents
• ID = participant ID
• age = age group (1 = young, 2 = old)
• education = education group (1 = comprehensive school, 2 = secondary,
3 = higher)
1
1.3 Describe your data
Show descriptive statistics for your numeric variables in a table. Useful func-
tions: summary, describe.
Plot test scores for Digit symbol and Benton tasks in all different time points.
At this point, you can simply take box plots separately for each time point. Re-
member to adjust scales so that the scores are comparable (tip: named argument
ylim= included in your boxplot function).
Finally, attach the deprivation data using attach.
(Refer to Demo 2 for help.)
2 Primer: distributions
Next, we will run a few simulations to demonstrate how different distributions
look like. Do not worry too much about the commands we use to make simu-
lations, the important thing is to focus on the behaviour and characteristics of
different distributions.
What about the distribution of sample means, based on taking just five
uniformly distributed random numbers?
2
Let’s add a normal curve:
# create a vector
> simu <- seq(-4,4,0.01)
# draw a normal distribution:
> plot(simu, dnorm(simu), type="l", lty=2,
ylab="Probability density", xlab="Deviates")
# add different t-distributions:
> lines(simu, dt(simu, df=5), col="red") # sample size = 5
> lines(simu, dt(simu, df=10), col="green") # sample size = 10
# ETC. play around by increasing sample size.
# What happens when you get close or beyond 30?
2.3 F-distribution
Another important distribution is the F-distribution. It is especially relevant
for ANOVA, since the ratio between two quotients follows F-distribution. F-
distribution depends on two parameters, which are the degrees of freedom (DF)
from the two quotients we are dividing. We can demonstrate the behaviour of
F-distribution:
# create a vector
> simu2 <- seq(0,4,0.01)
# draw an F-distribution
> plot(simu2, df(simu2,1,1), type="l", lty=2,
ylab="Probability density", xlab="Deviates")
# try changing degrees of freedom:
> lines(simu2, df(simu2, 1,10), col="red") # DF 1 = 1, DF = 10
3
> lines(simu2, df(simu2, 10,10), col="green") # DF 1 = 10, DF = 10
# ETC. play around by increasing both degrees of freedom separately.
In Section 5, we will see how F-distribution relates to statistical testing of
hypotheses.
Question 1 Generate 100 samples that follow the normal distribution. Use
rnorm() for this, which by default generates numbers with zero mean and unit
standard deviation. Estimate the mean and the standard deviation. What do
you observe? Why?
Question 2 Repeat the same but for a normal distribution with mean equal
to 3 and standard deviation equal to 2. Check ?rnorm() if you are unsure how
to do this. Plot a histogram of the data and put the mean and the standard
deviation on the title of the figure. You can do it using:
The command paste() concatenates strings so that they appear in the same
line. What do you see? Do we really need all these decimal numbers?
Question 3 The function round() can help us solve the problem that we
faced in Question 2. For example, running round(data,2) will keep only two
decimals. Repeat the histogram by keeping only two decimals for the mean and
the standard deviation.
Question 4 We saw that the estimated mean differs from the theoretical
mean when the sample number is small. What happens when different sample
numbers are tested. Estimate the mean for 1, 10, 100, 1000 normally distributed
numbers samples. Repeat this for 1000 times and plot the histograms in a 2x2
plot.
Remember: to make a 2x2 plot layout use:
layout(matrix(c(1:4),2,2)) # Make a 2x2 plot matrix
Then to make one histogram try this:
nsamples=1 # 1 sample
ntimes=1000 # Repeated 1000 times
# Make the matrix that contains the samples
datamat<-matrix(rnorm(nsamples*ntimes),nsamples,ntimes)
#Estimate the mean over the columns
means<-apply(datamat,2,mean)
# Make the histogram with clim to have same limits
hist(means,xlim=c(-3,3))
# Repeat that for more samples
Repeat the process to make all four histograms. Add an informative title.
4
Question 5 What happens when the sample size increases?
Question 6 If you throw a 6-sided dice, you get a random number from 1 to 6
with equal probability (i.e. uniformly distributed). How is the the distribution
of the sum of 2, 10 and 50 dices? Test that by repeating 10000 times similarly
as in Question 4. You can use the following example:
nsamples=5 # Number of dices (samples)
ntimes=1000 # Number of repetitions
# Generate continuous numbers from uniform distribution from 1 to 7 ...
samples<-runif(nsamples*ntimes,1,7);
# ... and round them to the lower integer
samples2<-floor(samples)
# Make the data matrix
datamat<-matrix(samples2,nsamples,ntimes)
#Apply the sum over the columns (dices)
sums<-apply(datamat,2,sum)
# Plot the histogram with adapted limits for integers
hist(sums,breaks=seq(min(sums)-0.5,max(sums)+0.5))
5
# Plot all numeric variables in same layout matrix:
> layout(matrix(c(1,2,3,4,5,6), 3, 2))
> hist(digitsymbol_1, main="Digit symbol 1")
> hist(digitsymbol_2, main="Digit symbol 2")
> hist(digitsymbol_3, main="Digit symbol 3")
> hist(bentonerror_1, main="Benton error 1")
> hist(bentonerror_2, main="Benton error 2")
> hist(bentonerror_3, main="Benton error 3")
Also, you can take “quantile-quantile plots” of your variables. If the qq-plot
follows the line, the distribution is normal.
> qqnorm(digitsymbol_1)
> qqline(digitsymbol_1)
3.1 Exercises
Test for normality in all our numeric variables (Digit symbol task at time points
1, 2, and 3; Benton task at time points 1, 2, and 3):
Question 9 Perform normality test using Shapiro-Wilk test for each numeric
variable. Report the W values that are returned by the test for each case.
Question 11 Compare the results between the two different normality tests.
6
bins=seq(0,15,1)
p1 <- hist(Data1,breaks=bins)
p2 <- hist(Data2,breaks=bins)
plot( p1, col=rgb(0,0,1,1/4)) # first histogram
plot( p2, col=rgb(1,0,0,1/4),breaks=seq(0,20,2), add=T) # second