Topics for Prelim exam: Continuous random variables including LN/Gumbel/Weibell ( 4.1-4.5), Multivariate Random Variables ( 5.1-5.2, 5.3-5.5), Estimators (Ch. 6: 6.1-6.2), Confdence intervals ( 7.1-7.3 but not CI for proportions or in 7.4), and Simple Hypothesis Testing ( 8.1-8.2, 8.4-8.5 - only one-sample tests.) ___________________________________________________________________________ Homework #9 Due: Monday Nov. 4, 2013 Read: Devore 7.1-7.3, 8.1-8.2 (We neglect discussions of proportions.) Goal: We have discussed various estimators. This assignment addresses the meaning of confdence intervals and how they are computed. A vehicle for developing this understanding is the Monte Carlo simulation capability of R and other statistical packages. That capability allows you to experiment with your own random samples, and to observe the sampling properties of the sample mean and sample variance. The last two problems provide an introduction to hypothesis testing, the focus of this weeks lectures. All this material is on 2nd prelim. Assignment These assignments are written to use R a free software available through Internet. Please see the attached handout for more details and tutorial information. Start early on R problems, and those problems should be fun. Please write your answers for the computer assignments on a separate sheet of paper (or cut and paste the results) so that we can grade your work easily. In order to minimize the amount of paper generated by this assignment, I text indicates where graphs need to be submitted with the homework. Problems 1 5 use the random number generation capability of R to illustrate empirically properties of the sample mean and sample variance, and of confdence CEE 3040 - UNCERTAINTY ANALYSIS IN ENGINEERING intervals. Attached R instructions supply information you need. 1) Suppose that contaminant concentrations are normally distributed with mean = 100 and standard deviation = 30. Consider the properties of small samples of 25 independent observations drawn from this distribution. Use the R command rnorm to obtain your own unique sets of random numbers. For example, to get one sample of N(100,30 2 ), you could type (note is passed to R): > x <- rnorm (25, 100, 30) To generate ten diferent sets of 25 independent normal random numbers using R, and to store your samples for subsequent computations, enter the R commands: > x <-rnorm(25,100,30) > for(i in 2:10){x<-matrix(c(x,rnorm(25,100,30)),nrow=25,ncol=i)} OR > x <- replicate(10, rnorm(25,100,30)) Both commands store the random data sets in matrix x[1:25,1:10] What do you get with the command: mean(x[1:25,2])? Have R calculate the ten sample means and ten variances of the ten samples using R- commands mean() and var(). Here is an R solution if done all at once: > xbar<-ector(mo!e = "n#meric", len$t% = 10) > &2<-ector(mo!e = "n#meric", len$t% = 10) > for(i in 1:10) cat("mean =", xbar'i(<-mean(x'1:25,i(), "ar =", &2'i(<-ar(x'1:25,i(),")n") where cat has the function of concaenate and print NOW generate dotplot for each of the ten samples using dotPlot in the supplemental R-package: BHH2. {OPTIONAL: You can also try hist and compare with dotPlot results.} To make Package BHH2 available in R (i) In the R environment, RGui, Select the menu item Package Intall !ackage() ("ac# Package$Intaller) chooe %R&' mirror %hooe BHH2 (rom the !ackage lit (ii) Then loa) the !ackage (ma* nee) to re!eat each time run R)# &G&I' elect the menu Package +oa) !ackage ("ac# Package "anager) %hooe BHH2("ac# check loa)) CEE 3040 - UNCERTAINTY ANALYSIS IN ENGINEERING The command to plot the second sample is (you can execute the command 10 times) > !ot*lot( x'1:25,2(, xlim=c(0,200), xlab=+ran!om n#mber+) WHERE: cat() concatenates & print; "\n" wraps text; xlim = limits = x-range For this problem, submit a summary of the ten averages and variances computed for each sample, as well as the dotPlot for at least one sample. (Note: The command window can be saved as a txt. fle showing all commands used and the non-graphic output.) 2) (a) Consider now all ten sample averages. If you havent already, created a xbar- vector with all ten sample averages, do so as follows: xbar<-ector(len$t%=10) Put all of the sample averages in a single vector using > for(i in 1:10){xbar'i(<-mean(x'1:25,i()}
Make a DOTPLOT of the ten calculated sample averages and turn it in with your assignment. > !ot*lot( xbar, xlab="&le aear$e") (b) If before you went to the computer you imagined as a new derived random variable the averages A that you would compute from samples of 25 normal random variables where: A = , = (1/25) [ X 1 + X 2 + . . . X 25 ] what is the population mean for A, A = E[A], and the variance A 2
= Var[A] ? (c) You have 10 sample average values (realizations of A). Please compute the sample average
A and sample variance S A 2 of those ten values of A. (in other words, compute the average and variance of the sample averages computed in Problem 1.) (Use R COMMAND.) > mean(xbar) > ar(xbar) CEE 3040 - UNCERTAINTY ANALYSIS IN ENGINEERING (d) Why is it that the expectation of the averages A = E[A] is diferent from the sample average
A of the ten samples that you generated? 3) (a) If before you went to the computer you imagined the sample variance S 2 as a random variable, what are its population (theoretical) mean E[S 2 ] and variance Var[S 2 ] ? [Recall the formula Var[S 2 ] = 2 4 /(n-1) for normal data.] (b) Make a DOTPLOT or histogram of the ten sample variances S 2 and turn it in with your assignment. > !ot*lot(&2, xlab="&le ariance") > %i&t(&2) (c) What are the sample average and sample variance of the ten values of S 2 that you generated? > mean(&2) > ar(&2) 4)(a) With each of the your samples, construct an 80% confdence interval for the true mean E[X] (which we happen to know is 100). R will do the work if you use the command > for(i in 1:10) cat(t,te&t(x'1:25,i(, m#=100, conf,leel=0,-).conf,int, ")n") You can use R to calculate your intervals, but do at least one by hand in order to show the needed calculations. (Please assume is unknown.) (NOTE: If you do not specify a percentage then R generates a 95% CI.) >t,te&t(x'1:25,3(, m#=100, conf,leel=0,-) (b) How many of the 80% confdence intervals actually contain the true mean? (c) If confdence intervals are generated randomly, as we have done: What is the probability an interval that will be generated will contain the true mean ? Of ten such intervals, how many on average will contain the true mean ? Of ten such intervals, what is the probability that exactly 8 will contain ? What is the variance of the number of intervals that will actually contain ? [Think Binomial distribution when answering the last three questions. CEE 3040 - UNCERTAINTY ANALYSIS IN ENGINEERING These are simple probability problems.] 5) The dotplots created with each of the ten samples with n = 25 may not look like a normal density function. Use R to generate one sample of 100 independent normal random variables with = 100 and = 30. Generate a Histogram or DOTPLOT and turn it in with homework. Does the graph look better for n = 100 than for n = 25? > x<-rnorm(100,100,30) > !ot*lot(x,xlab="ran!om n#mber") > %i&t(x) 6) Section 7.2, D8 p. 284 [D7 p. 269], using the data in problem 18, construct a 90% CI for the true mean strength of anchor bolts. (Large sample confdence interval; show computation.) 7) Section 7.3, D8 p. 293 [D7 p. 277], # 37a 8) Section 8.1, D8 pp. 308-309 [D7 pp. 293-94], #3, 4, 10abcd (simple hypothesis tests) 9) Section 8.2, D8 pp. 322-23, # 34, 36 [D7 p. 306, #32, 34 ] (Student t and hypothesis tests) (c) For #34 (#32 in D7) graph ,the Type II error, for n = 15 as a function of the true mean concentration , over 90 pCi/L < < 110 pCi/L assuming a standard deviation of 7.5 pCi/L. (See fgure 8.5, D8 p. 319 [Fig. 8.4 D7, p. 303]) (d) For D8 #34 (D7 #32), did you use a one- or two-sided test? Justify your choice. Learning objective: Students should know how (i) to select the hypotheses, (ii) to decide what tests are appropriate for diferent situations (large/small sample; one/two sided), (iii) to compute rejection regions for a test for a given type I error , (iv) to compute the type II error , and (v) to determine the reuired sample si!e n to achieve a speci"ed and # __________________________________________________________________________ ANSWERS not in book. 2) E[
A ] = ; Var[
A ] = 2 /n where n = 25. 3) E(S 2 ) = 2 (unbiased);
for normal observations: Var(S 2 ) = 2 4 /(n-1) where 2 is variance of X i . CEE 3040 - UNCERTAINTY ANALYSIS IN ENGINEERING 5) It should look better! 6) CI = 4.01 to ??? 8) #10a. H 0 : = 1300 versus H a : > 1300 [why?] #10b. Sample average is unbiased with standard error of 13.4. Type I error is 1%. #10c. (1350) = Pr(Z < 1.40) = 8.08% . #10d. Reject X 1322 so now (1350) = 1.88%; got larger but became smaller. <Not assigned> #10e. 1% corresponding to a critical z of 2.33 9) #34 (D7 #32). (a) t = -0.92, accept H o . (b) n = 30 #36 (D7 #34). look at equation for $.