174 views

Uploaded by Anonymous vaAswkZZm

ECON1203\

- Spatial Distribution of Human and Canine Visceral Leish
- Online Quiz 3
- Students Tutorial Answers Week12
- TABL1710 Assignment
- ECON1203 Business and Economic Statistics Quiz 1-4 Solutions
- Econ1203 Notes
- Critical Review of Journal Article in Grammar
- Tut sol week7(1).pdf
- ECON1203 Hw Solution week07
- Mgmt1001 Essay 2
- C09
- Hypothesis Testing
- ECON1203 Hw Solution week11
- OneSampleHypothesisTesting Paper -Doc
- ECON1203 Hw Solution week06
- Handini SNIPS
- Statistics 260 final
- The Effects of Situational Factors on in-Store Grocery Shopping Behavior
- Meeting 13_14 Non Parametric Statistics 16_17
- Lab 12 Practice(1)

You are on page 1of 15

Semester 1 2016

***This document will be periodically updated with questions to be discussed in

succeeding tutorials, and re-posted to Moodle every fortnight.***

Weeks 1 and 2

1. (a) What is meant by a variable in a statistical sense? Distinguish between qualitative and

quantitative statistical variables, and between continuous and discrete variables. Give

examples.

(b) Distinguish between (i) a statistical population and a sample; (ii) a parameter and a

statistic. Give examples.

2. In order to know the market better, the second-hand car dealership, Anzac Garage, wants

to analyze the age of second-hand cars being sold. A sample of 20 advertisements for

passenger cars is selected from the second-hand car advertising/listing website

www.drive.com.au The ages in years of the vehicles at time of advertisement are listed

below:

5, 5, 6, 14, 6, 2, 6, 4, 5, 9, 4, 10, 11, 2, 3, 7, 6, 6, 24, 11

(a) Calculate the frequency, cumulative frequency and relative frequency distributions for

the age data using the following bin classes:

More than 0 to less than or equal to 8 years

More than 8 to less than or equal to 16 years

More than 16 to less than or equal to 24 years.

(b) Sketch a frequency histogram using the calculations in part (a). What can you say

about the distribution of the age of these second-hand cars? Is there anything that

concerns you about the frequency table and histogram? Specifically, is the choice of bin

classes appropriate? What needs to be done differently?

(c) Halve the width of the bins (0 to 4, 4 to 8, etc) and recalculate the frequency,

cumulative frequency and relative frequency distributions. Using the new distributions

and histogram, what can you now say about the distribution of the age of second-hand

cars?

3. Health expenditure

A recent report by Access Economics provides a comparison of Australian expenditures

on health with that of comparable OECD countries. Data from that report relating to the

year 2005 have been used to reproduce their Figure 2.2 (below denoted as Figure 2.1).

(a) What are the key features of these data?

(b) While this is a bivariate scatter plot, there are three variables involved: health

expenditure, GDP and population. Why account for population by expressing health

expenditure and GDP in per capita terms?

GDP

7

6

5

4

3

2

1

0

0

10

20

30

50

40

60

70

Recent research by Dr Nigel Stapledon at the UNSW School of Economics provides

an extensive analysis of Australian housing prices since 1880. In Figure 2.2 his data

are used to provide a comparison of Sydney and Melbourne housing prices over time.

(a)

(b)

house prices in constant 2007-08 Dollars

600

Thousands of dollars

500

400

300

200

100

0

1860

1880

1900

1920

1940

1960

Year

Sydney

Melbourne

1980

2000

2020

(a)

Calculate the mean, median and mode for this sample of data and use these

statistics to further describe the distribution of car ages.

(b)

If the largest observation were removed from this data set, how would the three

measures of central tendency you have calculated change?

6. For the following statistical population, compute the mean, range, variance and

standard deviation: 3, 3, 5, 12, 13, 14, 17, 20, 21, 21.

What would happen to each of the measures you have calculated if:

(a)

4 were added to each data point (observation)?

(b)

7. Migrant wealth.

Suppose the Minister for Immigration is interested in research on the assimilation of

migrant households (a household where the chief income-earner is foreign born). The

Household, Income and Labour Dynamics in Australia (HILDA) survey is a

representative survey of Australian households. Using 4,669 household observations

for 2002 from HILDA, we find there are 3,567 households classified as Australianborn and 1,102 classified as migrants. One key consideration is how migrant

households are doing in terms of wealth compared with Australian-born households.

Using these data, we find the following:

Summary statistics for net household wealth ($A)

Australian-born

Mean

236,064

10th percentile

1,545

Median

123,020

90th percentile

560,006

Migrant

248,970

1,720

131,152

524,372

(a) What can you say about the distribution of net household wealth, for both

Australian-born and migrant households, by looking at just the mean and the median

figures?

(b) More generally, what can you say about the distribution of wealth for migrant

households compared to that for Australian-born households? In particular, which

type of household has greater variation in wealth?

(c) Suppose the minister has net household wealth of $600,000. What can you say

about his or her financial circumstances relative to other Australian-born households?

8. Sydney housing prices.

Figure 3.2 depicts a scatter plot of Sydney-area housing prices versus distance from

the CBD. The unit of observation is a suburb, price is the mean of the median price of

houses sold in each suburb for two quarters (those ending in September and

(a) What would you expect the correlation to be between price and distance?

(b) Does it appear that there is a linear relationship between the two variables?

(c) What other key features of these data can be determined from the plot?

Figure 3.2: House prices in Sydney suburbs versus distance to

CBD

6000000

5000000

Price $

4000000

3000000

2000000

1000000

0

0

10

20

30

40

50

60

70

80

9. Anzac Garage wants to develop guidelines for setting prices of cars according to the

cars age. They hire a business consultant who chooses a sample of 117 second-hand

passenger car advertisements collected from www.drive.com.au and retrieves data on

the age and price of the cars.

(a) The business consultant first calculates the correlation coefficient between age

and price and finds it to be -0.278. Interpret this result.

(b) Sketch what you think the scatter diagram from which this correlation coefficient

was calculated might look like. Suppose the business consultant constructs a simple

linear regression model using price as the dependent variable, and age as the

independent variable. What do you think the estimated regression line might look like

here? (We will return to this particular example later in the course and address this

question more formally.)

10. Big Data. Suppose you are sitting at the NSW Department of Health and have access

to information on hospital admissions, diagnosis, private insurance coverage, sex, age,

smoking status, and length of hospital stay for all patients at all NSW hospitals for

2000 through 2015. A team of statisticians in your department are available to

analyse these data following your direction.

(a) You get a phone call from the State treasurer wanting to know how much of your

budget you spend on smokers and smoking-related health problems. You promise

to get back to her, and put down the phone. What do you tell your team?

(b) You get a phone call from the Australian Council on Smoking and Health, asking

about any evidence that the State has on the association between smoking and

health outcomes. You promise to get back to them and put down the phone.

What do you tell your team?

11. Work through problem 34 on page 165 of Sharpe (Chapter 4).

Weeks 3 and 4

1. (a) Explain what it means to say that two probabilistic events in a sample space are

mutually exclusive of one another.

(b) Explain what it means to say that two probabilistic events in a sample space are

independent of one another.

(c) Why can two events not at the same time be both mutually exclusive and

independent of one another?

2. A department store wants to study the relationship between the way customers pay

for an item and the price of the item. 250 transactions are recorded and the following

table is formed.

Price category

Under $20

$20-$100

Over $100

Cash

15

11

6

Payment

Credit card

9

53

38

Debit card

18

52

48

Convert the table to a joint distribution. Express each of the following questions in

terms of probability statements, and then solve:

(a) What is the probability that an item is under $20?

(b) What is the probability that an item with a price tag of $43 is paid for in cash?

(c) What is the probability that people pay for an item that is at least $20 by credit?

(d) If somebody used a debit card to pay for an item, what is the probability that the

item was less than $100?

(e) Are price and means of payment independent?

3. In a small batch of 20 manufactured widgets, there are, in fact, 3 defective ones. You,

as quality control officer for the company making the widgets, decide to examine a

sample of 3 widgets, selected without replacement, to see how many defective ones

are selected.

(a) Use a probability tree to evaluate the probability distribution of the number of

defectives sampled.

(b) How would your answer change if the sampling were done with replacement?

7. The manager of a factory has determined from past experience that X, the number of

repairs required to machines in her factory on any one day, has the following

probability distribution:

x

P(X = x)

0.41

0.25

0.18

0.10

0.06

(a)

P(1 <X< 4)

(b)

P(0 X 3)

(c)

E(X)

(d)

Var(X)

(e)

positive number of repairs taking place?

(f)

Describe at least one business decision the manager might face that would be

impacted by the information in the original table of unconditional

probabilities.

(g)

Describe at least one business decision the manage might face that would be

impacted by the information in the table of conditional probabilities.

8. Suppose that the daily number of errors a randomly-selected bank teller makes is

denoted by X and follows the distribution given in the table below. A human resource

manager records the daily numbers of errors of two randomly selected tellers. Denote

the associated random variables by X1 and X2. As the selection is random, X1 and X2

are independent and follow the same distribution as X. The manager then computes

+

the sample mean = 1 2 where the sample size is n = 2.

2

X

P(X = x)

0.6

0.2

0.2

a.

Find the mean and variance of X1. Explain why we do not need to find the mean

and variance of X2 once we know those of X1.

b.

Since X1 and X2 are random, so is. Find the mean and variance of the random

variable. Compare these with the result from (a) and comment. Hint: you will

find it useful to note that (1 , 2 ) = 0 because X1and X2are independent. This

c.

d.

Find the possible values that may take. Hence list the probability distribution

of for samples of size 2. (This is known as the sampling distributionof ).

Examine briefly what would happen if n =3, 4, ? For this last sub-question, you

will need to use the idea of a factorial of an integer n, labelled !, which means n

multiplied by every positive integer smaller than itself. So, for example, 3! = 3

2 1 = 6. Also recall the combinatorial formula for the number of ways of

selecting x from n distinct objects(Sharpe page 193): Cxn = !/( )! !.

9. A student has enrolled in three courses in this semester. Lets call them courses A, B

and C. Her chances of passing each course are 0.8, 0.65, and 0.5, respectively.

Passing each course is assumed to be independent of passing other courses. Answer

the following:

a.

b.

What is the probability that this student passes exactly two courses? Express this

question in terms of probability statements, and then solve.

c.

What is the probability that this student fails at least one course? Express this

question in terms of probability statements, and then solve.

d.

a. What is the probability distribution of X?

b. What are the mean and variance of X?

c. Consider a game where you win $5 for every head but lose $3 for every tail that

appears in 4 tosses of a fair coin. Let the variable Y denote the winnings from this

game. Formulate the probability distribution of Y based on the probability

distribution of X.

d. What is the expected value of Y? Would you like to play this game? If so, why? If

not, why not?

Weeks 5 and 6

1. A random number generator is designed to draw numbers at random from within a

specified range. We can consider any number in the range as a possible outcome.

(a) What type of distribution is the random number generator drawing from?

(b) Suppose we program a random number generator to generate a random number

with a value falling in the interval [0, 2]. What is the height of the density of the

distribution from which the random number generator is drawing? Draw a graph

of the probability density function.

(c) What is the cumulative probability distribution of the random variable from

which draws are being taken? Draw a graph of the cumulative probability

distribution function.

(d) Find the following for this case: P(Y<0.6); P(Y0.6); P(0.5<Y<1.5), using both

the density function and the cumulative probability function. Show that your

answers match whichever you use.

2. From several years records, a fish market manager has determined that the weight of

deep sea bream sold in the market (X) is approximately normally distributed with a

mean of 450 grams and a standard deviation of 100 grams. Assuming this distribution

will remain unchanged in the future, calculate the expected proportions of deep sea

bream sold over the next year weighing

a) between 300 and 400 grams.

b) between 400 and 600 grams.

c) more than 625 grams.

normally distributed with a mean of $40,000 and a standard deviation of $6,000. What

proportion of households in the city have an annual income over $35,000? If a random

sample of 120 households were selected, how many of these households would we

expect to have annual incomes between $35,000 and $45,000?

4. What is the 75th percentile of the normal distribution N(10, 9)?

5. In a certain city, it is estimated that 60% of households have access to the internet. A

company wishing to sell services to internet users randomly chooses 150 households in

the city and sends them advertising material.

(a)

Calculate the probability that fewer than 90 contacted households have

internet access.

(b)

Calculate the probability that between 60 and 100 (inclusive) contacted

households have internet access.

(c)

There is an 80% chance (probability of .8) that the number of contacted

households with internet access equals or exceeds what value?

6. Using your personalized Course Project data:

(a) Calculate the sample averages of all variables. Which of these averages are

meaningful? Express the meaning of each average in words that are

understandable and effective for a layperson such as your client.

(b) Do you need to manipulate the raw data provided, before proceeding to

statistical analyses, in order to address the clients question? If so, how?

7. Work through problem 28 on page 264 of Sharpe (Chapter 7), referring to the 68-95-99.7

Rule explained on page 239-240 of Sharpe.

8. UNSW wants to measure the attractiveness of its brand to potential students. The

university performs an experiment by inviting 100 high school students from different

public schools across New South Wales to browse a few websites related to different

universities, and then to choose the one that they would prefer most.

(a) Is this a random sample? Can you think of any potential source of selection bias?

(b) Suppose that a perfectly random sample of students is drawn from the target

population, and these students take part in the exercise described above. With

reference to the brief discussion on page 732 of Sharpe (Confounding and

Lurking Variables), can you think of any confounding factors that is, factors

that might lead to lack of confidence in using students expressed preferences, as

measured in this exercise, as an indicator of their degree of overall attraction to

the UNSW brand?

(c) Suppose that the exercise described in part (b) is conducted. The resulting data

include each students high school, the selection of universities whose websites

they browsed, and the one amongst those that they chose as their most-preferred

university. Sketch on a piece of paper or in an Excel sheet what these data

would look like once they are made ready for quantitative analysis.

(d) Add to the display in part (c) any additional variables that you r answer to part

(b) indicated you might like to have access to. Show these variables in a form

that is analysis-ready.

(e) Suppose you had access to the expanded data set constructed in part (d).

Describe what sort of analyses you could conduct that might help to shed light

on UNSWs core question about the attractiveness of its brand.

(f) Based on your analysis, what would you be able to tell UNSW leadership about

the core drivers of its brand appeal that is, what it is about UNSW that students

are drawn to?

9. Work through problem 22 on page 325 of Sharpe (Chapter 9).

Weeks 7 and 8

1. Suppose a normally distributed random variable X has a mean of 50 and a variance of 100.

Also suppose a sample of size 16 is drawn from this population. Calculate the following

probabilities:

(a)

(b)

P(40< X <55)

P(40< <55)

2. Recall the Anzac Garage data used previously. These data are available from the

course website (in the Tutorial Questions and Information folder) in an Excel file

called AnzacG.xls. Use these 117 observations on used passenger cars to find the 95%

confidence interval for the population mean distance travelled by used passenger cars

(this variable is labelled odometer in the data set and is measured in kilometres).

Assume the population standard deviation is 60,000kms.

3. What would be the effects on the width of the confidence interval calculated in the

previous question of:

(a)

a decrease in the level of confidence used?

(b)

an increase in sample size?

(c)

an increase in the population standard deviation?

(d)

an increase in the sample standard deviation?

(e)

an increase in the value of found?

4. Again referring to the data in odometer from AnzacG.xls and the population from

which it is drawn, determine the sample size required to estimate the population mean

to within 5,000 kms with 90% confidence. Again assume the population standard

deviation is 60,000 kms.

5. Perform the following hypothesis tests of the population mean. In each case, draw a

picture to illustrate the rejection regions on both the Z and distributions, and

calculate the p-value of the test.

(a)

H0: = 50, H1: > 50, n = 100, = 55, = 10, = 0.05

(b)

H0: = 25, H1: < 25, n = 100, = 24, = 5, = 0.1

(c)

H0: = 80, H1: 80, n = 100, = 80.5, = 4, = 0.05

6. A real estate expert claims the current mean value of houses in a particular area is

more than $250,000. A random sample of 150 recent sales prices in the area yields a

sample mean of $265,000. It is known that house values in the area are

approximately normally distributed with a standard deviation of $50,000.

(a) Perform an upper tail test of the null hypothesis that the population mean house

value in the area is $250,000. Use a 5% level of significance and state the

rejection (critical) region in terms of both and z.

(b) Why is an upper tail test most appropriate in this case?

(c) What is the p-value associated with the test statistic used in the part (a) test?

Interpret this value.

(d) Define in words the type I and II errors that could afflict the part (a) test.

7. What effect does increasing the sample size have on the outcome of a hypothesis test?

Explain your answer using the example of a one-tail test concerning the mean of a

normally distributed population with known variance.

8. Work through problem 40 on page 420 of Sharpe (Chapter 12).

Then, re-do the analysis with all settings the same except supposing that:

c) The professors students scored 108 points on the final exam, having used the

software (and nothing else changed).

d) The number of students enrolled in the course decreased from 481 to 210

(and nothing else changed).

e) The standard deviation of the students scores increased from 6.3 to 25.2

points (and nothing else changed).

9. Project Review: For the course project, you are only expected to use statistical

methods covered in lectures and tutorials up to and including those in Week 9. Thus

you should now have sufficient material to complete the project in a timely fashion.

What might be useful at this stage is to think about presentation. See the Examples of

Statistical Reports section of the Project folder on Moodle for some ideas in general.

As a directed exercise for this tutorial, compare and contrast the presentation of

material in the NSW BOCSAR report on driving under the influence of cannabis

(driving-cannabis.pdf) and Queensland Office of Economic and Statistical Research

bulletin on computer and internet usage in Queensland (computer-internet-useageqld-c01.pdf). You should be able to read these reports comfortably, although there are

a few methods that may be unfamiliar in the cannabis report (although these methods

will be covered later in the course).

- Spatial Distribution of Human and Canine Visceral LeishUploaded byMarisela Fuentes
- Online Quiz 3Uploaded byRoyOberoi
- Students Tutorial Answers Week12Uploaded byHeoHamHố
- TABL1710 AssignmentUploaded byAsdsa Asdasd
- ECON1203 Business and Economic Statistics Quiz 1-4 SolutionsUploaded bySheharyar
- Econ1203 NotesUploaded bywhyisscribdsopricey
- Critical Review of Journal Article in GrammarUploaded byadiunderzz
- Tut sol week7(1).pdfUploaded byGorge Soros
- ECON1203 Hw Solution week07Uploaded byBad Boy
- Mgmt1001 Essay 2Uploaded byAnhPham
- C09Uploaded byFarah Fazlin Jamil
- Hypothesis TestingUploaded bytsrforfun
- ECON1203 Hw Solution week11Uploaded byBad Boy
- OneSampleHypothesisTesting Paper -DocUploaded byWendyAPriest
- ECON1203 Hw Solution week06Uploaded byBad Boy
- Handini SNIPSUploaded byIda Farida Ch
- Statistics 260 finalUploaded byOsman Wong
- The Effects of Situational Factors on in-Store Grocery Shopping BehaviorUploaded byadajain
- Meeting 13_14 Non Parametric Statistics 16_17Uploaded byAnggi Nugi
- Lab 12 Practice(1)Uploaded bycecilrw
- IE300_HW9Uploaded byYashu Madhavan
- Factors affecting the rate of growth of startups in IndiaUploaded bySaurabh Gupta
- A COMPARATIVE STUDY OF EFFECTIVENESS OF MODULAR AND E-LECTURE APPROACHES FOR LEARNING EDUCATIONAL RESEARCH CONCEPTS BY P.G. AND POST P.G. STUDENTS IN THE CONTEXT OF LOCUS OF CONTROLUploaded byAnonymous CwJeBCAXp
- Ap13 Statistics q4Uploaded bycurlyfriez
- Impact of Cash FlowUploaded byAprilya Fitriani
- Disclosure of Capital Lease and Stock PricesUploaded byVincent Lee
- Ch4 SlidesUploaded byRossy Dinda Pratiwi
- Data Analysis using SPSSUploaded byantonybuddha
- Bio StatisticsUploaded byAnna
- Report of Energy Medicine MusicUploaded byJason Youngquist

- MINITABDAT.pdfUploaded byJulio Raymondi
- Kelompok3 Soal MarkovUploaded byJasa Cetak
- Univariate Time SeriesUploaded byShashank Gupta
- Principles of measurementUploaded byShyam Shankar
- Introduction and Descriptive Statistics - Vinjar FønnebøUploaded bymsulaymaan123
- Bayesian Risk AnalysisUploaded byValya Ruseva
- 0121741516 ProbabilityUploaded byIshar Pratap Singh
- Ptsp Mid 2 QuestionsUploaded byAnjalee Prabha
- Probability InsuranceUploaded byKairu Hiroshima
- Performance Models- Representation and Analysis MethodsUploaded byTesfahun Girma
- Capstone Project R Code - Hotel Room Pricing in Indian CitiesUploaded byShrey Shailesh Shah
- Bayes WorkbookUploaded byAllister Hodge
- 20060503Uploaded byRinjani Pebriawan
- 12 CorrelationUploaded byKai Qi Lee
- IntroductionUploaded byjomari_santos
- Sampling Methods Applied to Fisheries ScienceUploaded byAna Paula Reis
- MSc Quantitative Finance Thesis - Hidde HovenkampUploaded byHidde Hovenkamp
- Statistical Machine Learning: lecture notes on linear regression, logistic regression, deep learning & boostingUploaded byKai Lin Zhang
- Assign 3Uploaded byMuhammad Rizwan
- Kernel Density Estimation (KDE) in Excel TutorialUploaded bySpider Financial
- Chapter 8 Review GuideUploaded bykrystin_flamer
- Statistical symbols & probability symbols (μ,σ,..Uploaded byJohn Bond
- Solution 3Uploaded byNitinKumar
- 325bookUploaded byNathan Graves
- Errors in Data and CalculationsUploaded bymichsantos
- Implementation of discrete hidden markov model for sequence classification in C++ using EigenUploaded bypi194043
- practicetest1-08fUploaded byTam D. Phan
- Scheme of Work STA408 (MARCH 2014)Uploaded bybellaamin
- 5. Mgf, Discrete Statistical Distributions -551 (1)Uploaded bySatheeskumar
- Data Mining - Classification: Alternative TechniquesUploaded byTran Duy Quang