You are on page 1of 44

UCSD Extension

(March 12, 2012)



Group Members:
Hernandez, Agnieszka; Mathew, Babu; De Leon, Luis;
Yanamandra, Shanthi; Olin, Thomas.




I ntroduction
In light oI the U.S. economic crises: U.S. customer`s started taking a product`s origin into
account as a variable when making a product purchase decision, including the preference to
choose U.S. produce as a way of supporting national-economy and national-development. This
report investigates whether making the decision about supporting U.S-origin produce such as
cars, is in general a cost-effective decision. Knowing the information, we will also make a
hypothesis and determine if there is a strong relation between the car price and car fuel economy
(MPG).
Our Team chooses our analysis topics because we were able to perform all three types of
analysis required. Additionally our determination is that this is interesting information the
audience of magazines would consider to be current and of value. This includes market trends
and selecting factors that indicate importance for consumers to use when buying domestic or
foreign manufactured cars.
In the first section: we are examining if the car`s population-data of US-origin and non-US
origin are homogeneous. This information will be further used to interpret the results of our
analysis performed in the second section: both population means are compared. The outcome of
population homogeneity verification will help us better understand if there are similar types of
cars when it comes to the population`s structure (by car type). This step is overtaken because
both populations are consisting of various types of cars that differently impact the research
dependent variable: Car Price. This step would not be needed (from the topic selection
perspective) if we have enough sample accounted within each car type segment and would be
able to perIorm the analysis on each car type separately. In this alternative case: the population`s
homogeneity would not be in question.
In the second section we are examining if the average car price of US-origin verses non-US
origin car are significantly different, with the assumption to be verified that the American cars
are less expensive than non-US cars. That information would help us determine if, U.S.-origin
car purchase is an effective decision only from the car price perspective, and without taking other
important factors into account including: fuel-economy, reliability etc. Customer most likely
should take into account when making particular car purchase decision`s all these other Iactors.
The results of our analysis will help us as well as future cars customers, make informative
decisions based on our research study oI Domestic or Ioreign car`s, when considering car price
comparison oI car`s.
In the third section we will examine other (than car origin) selected potential car price
factors that might possibly impact car prices in general. The selected dependent variable is the
car fuel economy in the city (in first regression model) and highway fuel economy in on the
highway (in second regression model). As mentioned earlier, there are possibly much more car
price factors to be taken into consideration as dependability (car reliability, customer
satisfaction) but the selection of these particular dependent variables is based on data availability
in the given data set. We will examine how strong of a correlation exists between the city-fuel
economy and price; highway-fuel economy and price. We will examine it by performing
regression analysis and verifying in what percent the assumed variable describes the car price
dependence.



Data Discussion:
We did some data transformations before working on the different analyses. To maintain
consistency, we have used the same data for the whole report. Here is a detailed discussion as to
what was done, and the data is in the appendix.
Various types of cars that are based in US and Non-US are shown below.
Type US Non-US
Compact 7 8
Mid-Size 10 11
Small 7 14
Sporty 8 6
Van 5 4
Large 11 0

From the diagram it is clear that Non-US cars do not have a type Large`. For the purpose oI this
study, the type Large` will be excluded as type Large` can potentially inIluence and the
expected outcome can be biased.
Similarly, since the sample size is not the same, to make the study more non-biased, we need to
make the sample size the same for the same set of Cars. This can be accomplished by the doing
the following
x Removing randomly selected two (2) Compact type cars from the non-US car sample,
x Removing randomly selected one (1) Midsize type cars from the non-US cars sample,
x Removing randomly selected seven (7) Small type cars from the non-US cars sample,
x Removing randomly selected two (2) Sporty type cars from US cars sample,
x Removing randomly selected one (1) Van type car from US cars sample.
Once these changes are made, the new sample distribution will look as follows

Type US Non-US
Compact 7 7
Mid-
Size 10 10
Small 7 7
Sporty 6 6
Van 4 4


NOT E: A significance level of 0.05 has been chosen throughout the report. The reason
being we thought that the consequences of making of Type 1 er ror were not severe.









Q1: Chi-Square test of Homogeneity between types of cars for U.S. and Non-
U.S origin cars
Descriptive Statistics:
The study is focusing on the car types based on American and Non-American Origin. In the
descriptive statistics, we will be looking into the distribution of various car types based on the
origin. The diagram shown below represents the frequency distribution of the various types of
Non- U.S. origin cars. . The sample data is given in the appendix (Table 1.1 & Table 2.1)
For the sake of analysis, descriptive information has been translated into a numerical equivalent
and the translation is given below.
(Table 1) Car Types and Descriptions
Car Types Description
1 Compact
2 Small
3 Mid Size
4 Sporty
5 Van
6 Large
Types of
Cars Frequency Cumulative %
1 9 20.45%
2 14 52.27%
3 11 77.27%
4 6 90.91%
5 4 100.00%
6 0 100.00%


From the frequency diagram we can see that most of the non U.S. based cars belongs to Type 2
which is Small` type (From Table 1 above). It is also evident that 80% of the Cars are covered
under the first three types of Cars (i.e Small, Compact & Mid size). From this we are derive that,
non U.S. based CAR manufacturers focus more on the small car segment.
In the following diagrams given below, the descriptive data about U.S. based cars are provided.
Car Types Frequency Cumulative %
1 7 14.58%
2 7 29.17%
3 10 50.00%
4 8 66.67%
5 5 77.08%
6 11 100.00%

0%
20%
40%
60%
80%
100%
0
5
10
15
1 2 3 4 5 6
F
r
e
q
u
e
n
c
y

Car Types
Non US Originated Cars
Frequency Cumulative %

From the frequency distribution chart provide above, the types of car that has the highest
frequency is, vehicles oI type Large` (corresponds to number 6, and the description is based on
Table 1 above). From the cumulative frequency distribution, the first three types of Cars
(Compact, Small & Mid Size) cover only 50% of the Cars. The other 50% is distributed between
Sporty, Van and Large types of vehicles. This could mean that, the US based manufactures are
either focusing the entire spectrum of the vehicles market or there is no real focus on a given
segment at all.
I nferential Statistics: CHI-SQUARE T EST
Considering the fact that we will be comparing the average price of American cars versus non-
American cars we want to verify if both data sets have the same proportion of elements with the
same characteristics describe by cars type. Car type proportion within each population is
important to be considered, as car type characterizes the car by specific set of features such as car
size or engine size that overall heavily impact cars prices. For that purpose we are employing the
chi-square test to verify both American and Non-American cars populations homogeneity based
on cars type.
0%
20%
40%
60%
80%
100%
0
2
4
6
8
10
12
1 2 3 4 5 6
F
r
e
q
u
e
n
c
y

Car Types
US based Cars
Frequency Cumulative %

Base on the available sample size and the subject of our study we want to verify that American
and Non-American population are homogeneous and have the same proportions with the
characteristics. For that purpose we are employing the chi-square test to verify both populations
homogeneity based on sampled cars type.
The test hypothesis is as follows:
H
0
: American cars and Non-American cars have the same proportion of cars types in their
populations.
H
1:
American cars and Non-American cars have different proportion of cars types in their
populations.

The count of cars by type for each population was summarized in the contingency table below:
Cars Type American
Cars
Non-American
Cars
Total
Compact 7 9 16
Large 11 0 11
Midsize 10 11 21
Small 7 14 21
Sporty 8 6 14
Van 5 4 9
Total 37 55
92

Table below reflects observed

and expected

frequencies to be used in statistic calculation:



Observed Expected

American
Cars
Non-
American
Cars
American
Cars
Non-
American
Cars
Compact 7 9 6.4 9.6
Large 0 11 4.4 6.6
Midsize 10 11 8.4 12.6
Small 7 14 8.4 12.6
Sporty 8 6 5.6 8.4
Van 5 4 3.6 5.4

where:

- expect observations count in given category for given sample was calculated using the
following formula:



Considering compiled expected counts all applicable test requirements are satisfied:
1. All expected counts are greater than or equal to 1 and
2. No more than 20% of the expected frequencies are less than 5.
The level oI conIidence 0.05. The reason being we thought that the consequences of
making of Type 1 error were not severe.
The test statistic is as follows:

=10.92
Considering the seriousness of making type I error we assume the level of significance =0.05.
This test statistics has a

distribution with (5-1)(2-1) = 4 degrees of freedom.


(Classical Approach)
There are r=2 and c=4 so we find the critical value of the

distribution using (5-1)(2-1) = 4


degrees of freedom.
Considering the fact that this is a right-tailed test, the critical value at assumed level of
significance =0.05 with 4 degrees of freedom is 9.488
Because the test statistics 10,92 is greater than the critical value, 9.488, we reject the null
hypothesis.


(P-Value Approach):
There are r=2 and c=4 columns so the P-value using (5-1)(2-1) = 4 degrees of freedom. The P-
values is the are under the

distribution w 4 degrees of freedom the right of

= 10.92
Using the

distribution table we found the row that corresponds to 4 degrees of freedom the
value of 10.92 lies between 9.488 and 11.143. The area under the

distribution (with 4 degrees


of freedom) to the right of 9.488 is 0.05. The area under the

distribution to the right of 11.143


is 0.025. Because the 10.92 is between 9.488 and 11.143, the P-value is between 0.025 and 0.05.
So 0.025<P-value<0.05
Because the P-value is les then the level of significance, =0.05, we reject the null hypothesis.
Conclusion:
There is a significant evidence at the =0.05 level of significance to conclude the distribution of
American cars is different from the distribution of Non-American cars.



















Q2: Hypothesis test and CI for two Means
Test to see if Average price of U.S. car is less than the average cost of Non U.S. car
Description:
A sample set of data had been provided about cars of various origins, and the information
includes details about the price, mileages and many other characteristics. We need to conduct a
test to check whether the average cost of Non US originated cars are higher than average cost of
the US originated car.
We are testing the hypothesis regarding the difference of two means from independent samples.
To perform the test without knowing population standard deviation, we verified that the
following requirements were met:
1. The samples were obtained using simple random sampling;
2. The samples are independent;
3. Both samples sizes are greater than 30.
The hypothesis here is
H
0
: Average cost of Non US Originated cars = Average cost of US originated Cars
H
1
: Average cost of Non US Originated cars > Average cost of US originated Cars
Using the Origin as a parameter it is possible to distinguish between the cars made in US and
outside of US. For the purpose of this study, we will be doing a comparison of between the
prices of the cars based on the Origin.
Descriptive Statistics:
The number of cars available based on a price range is shown below. For the sake of calculate,
the mid price point was identified and the distribution is given based on the mid point price for
a given range.
Frequency & Histogram information for NON US cars
Mid-Price Point Frequency Cumulative %
9 3 8.82%
14 7 29.41%
19 5 44.12%
24 9 70.59%
29 3 79.41%
34 4 91.18%
39 2 97.06%
50 1 100.00%


For NON us based cars the highest number of cars (regardless of the type of cars, such as
compact, mid-size etc) available are with a pricing midpoint of 24 followed by 14.
Frequency & Histogram information for US cars
0%
20%
40%
60%
80%
100%
0
2
4
6
8
10
9 14 19 24 29 34 39 50
F
r
e
q
u
e
n
c
y

Bin
Non US Based Car Prices
Frequency Cumulative %
Mid Point Price Frequency Cumulative %
9 2 5.88%
14 11 38.24%
19 13 76.47%
24 3 85.29%
29 2 91.18%
34 0 91.18%
39 2 97.06%
50 1 100.00%



In the case of U.S. based cars, most of the cars (regardless of the car type such as compass, mid-
size etc) fall into the 19 price point, followed by 14 price point.
Summary statistics
Price Stats Non-U.S. cars U.S. cars
Minimum 8.30 7.30
First Quartile 12.85 11.60
Median 19.30 15.65
Third Quartile 27.53 18.88
Maximum 47.90 40.10
Mode 19.10 11.10
IQR 14.68 7.28
Lower Fence -9.16 0.69
0%
20%
40%
60%
80%
100%
0
2
4
6
8
10
12
14
9 14 19 24 29 34 39 50
F
r
e
q
u
e
n
c
y

Mid Point Pricing
US based Car price
Frequency Cumulative %
Upper Fence 49.54 29.79
Sample Size 34 34
Mean 20.82 17.04
Standard
Deviation 9.56 7.76



I nferential Statistics:
For the purpose of the analysis, the following statistics will be used (The sample size is provided
in the appendix)
Step 1: Let 1 represents the mean price of the Non US based cars and 2 represent the price of
the US based cars. We are trying to prove that 1 = 2
With that
H
0
:
1
=
2
or H
0
:
1
-
2
=

0
H
1
:
1
>
2
H
1
:
1
-
2
> 0
Step 2: The level oI conIidence 0.05. The reason being we thought that the consequences
of making of Type 1 error were not severe.
Step 3: The sample static given below. The test statistic is
t
0
((x

1
- x

2
) (
1
-
2
)) / (sqrt(s
1
2
/n1 + s
2
2
/ n2))
= (20.821-17.044) 0 )/
( sqrt (+ (9.56 * 9.56 / 34) + (7.756

* 7.756 / 34))
= 3.777 / 2.11
= 1.79
Step 4: (Classical Approach) this is a right tailed test. Since the size of both the samples is 34, we
have 34-1 = 33 degrees of freedom. With that we need to find t

= 0.05 with

33 degrees of
freedom. The critical value is 1.692

Step 5: Since the test statistic is not greater than critical value we reject the null hypothesis
Step 4: (P Value Approach). Since this is a right tailed test, the P-value is the area to the right
of the t
0
= 1.79. Using the t-distribution, we find the value corresponding to 1.79 lies between
1.694 and 2.037 (approx) and the corresponding values oI will be .05 & .025.
Step 5: That is 0.05 < P < 0.025
Since P value is less than , we reject the null hypothesis.
Step 6: There is sufficient evidence to conclude that the mean price of Non US originated cars
are greater than the mean price of US originated cars.
Computing Confidence I ntervals
The goal will be to find the confidence interval at 95% level of confidence. With that we will
use the following formula (Equation 3 from 11.2)
Lower Bound:

1
-

2
) t
/2
. sqrt(s
1
2
/n1 + s
2
2
/ n2)
Upper Bound:

1
-

2
) + t
/2
. sqrt(s
1
2
/n1 + s
2
2
/ n2)
Substituting the values in the equation
Lower Bound: = (20.821 - 17.044) 2.035 * sqrt ((9.56 * 9.56 / 34) + (7.756

* 7.756 / 34))
= 3.777 2.035 * 2.11
= 3.777 4.294
= -0.517

Upper Bound: = (20.821 - 17.044) 2.035 * sqrt ((9.56 * 9.56 / 34) + (7.756

* 7.756 / 34))
= 3.777 2.035 * 2.11
= 3.777 + 4.294
= 8.071
Conclusion: We are 95% confident that, the mean difference between price of the Non US cars
and that of US cars is between -0.517 and 8.071.













Q3: Regression: Analysis of the Relationship between Mileage and Price of a
car
The variables Price, City MPG and Highway MPG from the cars dataset have been analyzed to
see if there exists a linear relation between the price of the car and it mileage. The reason these
were picked is that if a customer could understand this relationship, he would be able make an
informative decision about the car choice based on its purchase price and associated fuel
expenses. If he knows that expensive cars have good mileage, or expensive cars have poor
mileage or vice-versa, he will be able to decide which car was better a fit.
The reason these were picked is that if a customer could understand this relationship, he would
be able to plan for his total car expense. If he knows that expensive cars have good mileage, or
expensive cars have poor mileage or vice-versa, he will be able to decide which car was better
for him.
For the purpose of consistency in the underlying data analyzed, we have used the dataset by
removing the large type cars and also a random sample. The modified dataset is as per the
explanation in the Introduction and raw data can be seen in the Appendix.
Least square regression has been used to analyze the relationship between price and mileage.
Also, the analysis is spilt into 2 cases. One analysis with city MPG and the other one with
Highway MPG. The explanatory variable in both cases is the mileage and the predictor variable
is the Price. Based on the usage of the car inside the city or highway driving, the customer can
consider the appropriate relationship.
The data used has been obtained by random sampling, and the analysis is performed with 68
observations. Below are the summary statistics:
Statistic
Price City MPG
Highway
MPG
Minimum 7.4 15 20
Fi rst Quartile 12.1 18 25
Median 16.4 22 28.5
Thi rd Quartile 22.7 25 31.25
Maximum 47.9 46 50
Mode 11.1 18 30
I QR 10.6 7 6.25
Lower Fence -3.8 8 16
Upper Fence 38.6 36 41
Sample Size 34 34 34



Mean 18.93 22.41 29.07
Standard
Deviation 8.85 5.38 5.44

The Outliers in Price are: 40.1, 47.9.
The Outliers in City MPG are: 39, 46.
The Outliers in Highway MPG are: 43, 50.
Distribution of the data-Price:

Price
Bin Frequency Cumulative %
7 0 0.00%
14 23 33.82%
21 26 72.06%
28 8 83.82%
35 6 92.65%
42 4 98.53%
49 1 100.00%

0%
50%
100%
0
10
20
30
7 14 21 28 35 42 49
F
r
e
q
u
e
n
c
y

Bin
Histogram-Price
Frequency Cumulative %
City MPG

City MPG
Bin Frequency Cumulative %
14 0 0.00%
21 33 48.53%
28 26 86.76%
35 7 97.06%
42 1 98.53%
49 1 100.00%
Highway MPG:

Highway MPG
Bin Frequency Cumulative %
18 0 0.00%
22 5 7.35%
26 19 35.29%
30 21 66.18%
34 15 88.24%
38 5 95.59%
42 1 97.06%
46 1 98.53%
50 1 100.00%

0%
50%
100%
0
10
20
30
40
14 21 28 35 42 49
F
r
e
q
u
e
n
c
y

Bin
Histogram-City MPG
Frequency Cumulative %
0%
50%
100%
0
10
20
30
18 22 26 30 34 38 42 46 50
F
r
e
q
u
e
n
c
y

Bin
Histogram-Highway MPG
Frequency Cumulative %
I nferential Statistics
Case 1: Least Squares Regression- Analyze rel ationship between City MPG and Price.
Below is the scatter plot with the line and the equation.

The equation of the line is :
Y=-1.0538x+42.55.
Slope and Y-intercept:
The slope is -1.054, which means that if the average city mileage increases by 1MPG, then the
price of the car decreases by 1.05. The two variables are negatively associated.
The y-intercept is 42.55. As 0 MPG or values close to 0 for mileage do not make sense, there is
no interpretation for the y-intercept.
Linear cor relation coefficient and the coefficient of determination:
>> The linear correlation coefficient r is calculated (using Excel)
r =



y = -1.0538x + 42.55
R = 0.4106
0
10
20
30
40
50
60
0 10 20 30 40 50
P
r
i
c
e

City MPG
Scatter Plot
Price
r = - 0.64075
This implies a moderate negative linear relation between price and city mileage.
>> The coefficient of Determination R
2
is also calculated in Excel.





R
2
=0.4106
This implies that 41% of the variation in the Price is explained by the least square regression
line.
Testing the Regression Model :
Step 1: Testing to make sure the requirements of the regression model are met.
Requirement 1: If the plot of residuals against the explanatory variable shows any
discernible pattern, then the linear model is not appropriate. Check the residuals plot below:



The residuals do not follow any particular pattern, so our first requirement is not violated.
-10
0
10
20
30
0 10 20 30 40
R
e
s
i
d
u
a
l
s

City MPG
Residuals Plot




Requirement 2: The residuals have to be normally distributed. See the probability plot for the
residuals.

The residuals have a normal distribution.
Step 2: Hypothesis test to know if there exists a linear relation between price and city
mileage.
If
1
is the slope of the regression line, we will perform a two tail test to check if
1
is equal to
zero or not. If
1
is equal to zero, then the assumption that a linear relation exists between price
and mileage is not valid:
Null Hypothesis
H
0
:
1
=0
0
20
40
60
0 20 40 60 80 100 120
Y

Sample Percentile
Normal Probability Plot
Alternate Hypothesis
H
1
:
1
= 0
~~ The level oI signiIicance Ior this test is 0.05.

Calculating the Test statistic:
Test Statistic (t
0
) = b1 -
1

S
b1

To compute the test statistics, we are assuming that the null hypothesis is true. Hence we
assume
1
= 0.

Standard Error (S
e
) = Sum of Residuals
2
= 6.843
n-2


S
b1
= S
e .
= 0.15545
\ (Sum oI x-Mean of x)
2


Test statistic t
0
= -6.78
Classical Approach: The t-value with 0.05 level of significance for a two tail test at 66 degrees of
freedom is approximately 1.997.
Given that the test statistic is to the left of -1.997, we reject the null hypothesis.
P-value approach: the P-value for the test is 0.0000000039771193, which is less than the level of
significance 0.05; therefore we reject the null hypothesis.
Step 3: Conclusion
There is sufficient evidence at 0.05 level oI signiIicance to conclude that a linear relation
exists between price and city mileage.
Confidence I ntervals for the Slope of the Least Square Regression Line
95% confidence intervals for the slope of the regression line is calculated in excel:
The lower bound is -1.364, and the upper bound is -0.7435.
We are 95% confident that the mean decrease in price for each additional mileage point is
somewhere between -1.364 and -0.7435.
Case 2: Least Squares Regression- Analyze rel ationship between Highway MPG and Price.
Below is the scatter plot with the line and the equation.

The equation of the line is:
Y=-0.9928x+47.796.
Slope and Y-intercept:
The slope is -0.9928, which means that if the average highway mileage increases by 1MPG,
then the price of the car decreases by 0.99. The two variables are negatively associated.
y = -0.9928x + 47.796
R = 0.3723
0
10
20
30
40
50
60
0 10 20 30 40 50 60
P
r
i
c
e

Highway MPG
Scatter Plot
Price
The y-intercept is 47.796. As 0 MPG or values close to 0 for mileage do not make sense, there is
no interpretation for the y-intercept.
Linear cor relation coefficient and the coefficient of determination:
>> The linear correlation coefficient r is calculated (using Excel)
r =


r = - 0.6102
this implies a moderate negative linear relation between price and highway mileage.
>> The coefficient of Determination R
2
is also calculated in Excel.


R
2
=0.3723
This implies that 37% of the variation in the Price is explained by the least square regression
line.
Testing the Regression Model :
Step 1: Testing to make sure the requirements of the regression model are met.
Requirement 1: If the plot of residuals against the explanatory variable shows any
discernible pattern, then the linear model is not appropriate. Check the residuals plot below:



The residuals do not follow any particular pattern, so our first requirement is not violated.
Requirement 2: The residuals have to be normally distributed. See the probability plot for the
residuals.

The residuals have a normal distribution.
Step 2: Hypothesis test to know if there exists a linear relation between price and city
mileage.
II is the slope oI the regression line, we will perIorm a two tail test to check iI is equal to
zero or not. II is equal to zero, then the assumption that a linear relation exists between price
and mileage is not valid:
Null Hypothesis
-10
0
10
20
30
0 10 20 30 40 50
R
e
s
i
d
u
a
l
s

Highway MPG
Residuals Plot
0
20
40
60
0 20 40 60 80 100 120
Y

Sample Percentile
Normal Probability Plot
H
0
:
1
=0
Alternate Hypothesis
H
1
:
1
= 0
>> The level of significance for this test is 0.05.

Calculating the Test statistic:
Test Statistic (t
0
) = b1 -
1

S
b1

To compute the test statistics, we are assuming that the null hypothesis is true. Hence we
assume
1
= 0.

Standard Error (S
e
) = Sum of Residuals
2
= 7.062
n-2


S
b1
= S
e .
= 0.1587
\ (Sum oI x-Mean of x)
2


Test statistic is calculated as t0 = -6.26
Classical Approach: The t-value with 0.05 level of significance for a two tail test at 66 degrees of
freedom is approximately 1.997.
Given that the test statistic is to the left of -1.997, we reject the null hypothesis.
P-value approach: the P-value for the test is 0.0000000331, which is less than the level of
significance 0.05; therefore we reject the null hypothesis.
Step 3: Conclusion
There is sufficient evidence at 0.05 level oI signiIicance to conclude that a linear relation
exists between price and highway mileage.
Confidence I ntervals for the Slope of the Least Square Regression Line
95% confidence intervals for the slope of the regression line is calculated in excel:
The lower bound is -1.31, and the upper bound is -0.676.
We are 95% confident that the mean decrease in price for each additional highway mileage
point is somewhere between -1.31 and -0.676.
Conclusion:
With the three analyses performed: we lastly conclude meaningful information about what
impacts the car price, that every automobile magazine reader should find interesting and useful
to read, understand, and make assessments.
Based on performed analysis with 95% level of confidence we concluded in the first part
of the analysis that the populations of US and Non-US cars are not homogeneous. That implies
the selected car types in considered populations take different share. It is especially evident in
case of Large cars type that is apparent only in US sample. These circumstances raise the
question if Non-US car developers manufacture this kind/type of cars at all. If not, then naturally
US customers with preference to Large cars are limited to select only from US origin cars despite
the cars` price. Based on the share oI other car types in both samples the proportions are not
largely divers; therefore in further analysis it would be interesting to test whether the two
population are homogeneous without Large car type being taken into account. The information
about both populations not being homogeneous means that the populations of US and Non-US
cars should not be directly compared without car type being taken into consideration in the
analysis.
Based on performed analysis with 95% level of confidence we concluded in the second
part of the analysis that US cars are less expensive than the Non-US origin car types. What this
finding communicates to the magazine readers is that it is worth to consider a purchase of US
cars, as opposed to Non-US cars, because they have better economical value and additionally
their purchase will support national economical growth and development.
Based on performed analysis with 95 level of confidence we concluded in the third part
of the regression analysis, that the car fuel economy influences the car prices. City fuel economy
upholds a better price description than Highway fuel economy. This conclusion was made by
comparing the Coefficient of Determination R
2
that was calculated for the two separated
regression models, build with one dependable city fuel economy (in the first model) and highway
fuel economy (in the second model). In both cases, the Coefficient of Determination R
2
come out
low: 0.41 and 0.37, this offers suggestion that there are other important factors that describe and
influence Car Price. The possible independent variables are: cars reliability factor, customer
satisfaction with the car, car`s esthetics etc. In further analysis, we find the analyzing other
factors that impact car price would be interesting. If possible, a creation of the formulation of
regression model that describe car price in 90%-99% range, could be useful. In the regression
analysis we concluded: car fuel economy negatively influences car price. Implying that lower
fuel economy cars have greater prices. Also it can be explained by the fact that larger cars types
consume more fuel (have lower fuel economy) and in the same time cost more because of the
overall greater size and greater engine power. That is an expected outcome, as in the US market
economy, the price for more economical produce are higher, due to the higher demand for those
products. Further analysis could be done to veriIy the impact oI car`s Iuel economy on US and
Non-US cars separately.
While a linear relationship exists between both Price and (highway and city) Mile Per
Gallon, the decrease in MPG, resulting in an increase in cost, may be due to factors such as
weight, engine size, vehicle type, engineering, etc. The are more variety of US originated cars
than that of Non-US originated cars. Foreign manufactured cars were of smaller size than that of
Domestic.
In conclusion, what we learned is that analyzing consumer information has many variables
and requires a thoughtful process in selecting which data the public truly cares about, for
instance: we don`t know how much a premium will be paid for a car that has increased gas
mileage (MPG)
Ideas for future tests would be to compare cost, mileage, and the number of cylinders: to
examine if expensive, highly engineered, cars get better gas mileage.
Another example: Comparing Weight, Mileage, and Nationality Origin: to examine if foreign
cars of the same weight, have higher gas mileage (MPG) and may be engineered for higher fuel
economy due to the fact that most Non-US countries have significantly higher gas prices than
that of the US.
Lessons Learned:
Here is what the team members thought about the lessons learnt in the project work:
Team member 1: I learned about the real life application of statistics during this course. The
questions we tried to answer in the project were of real life significance in nature.
To me, I have learned about the importance of collecting the sample properly, doing the
descriptive analysis, based on sample size and the type of the study. I also learned to correctly
interpret the result of a statistical analysis.
Working with the group was very helpful, even though it was an on line course. We were able
greatly benefit from the collective knowledge of the group. Since it was an on line course,
getting everyone at the same time for a meeting was a challenge. But on the other hand even for
the courses on campuses will have similar challenges.
Team member 2: It was very difficult to decide on project title without having the full
understanding of the theory. Based on the set deadlines for the project, the team had to first
decide on the topic and scope of the analysis. In the same time the theory of the material was not
cover on the lessons plan. I found myself adjusting the analysis topic of the project part assigned
once the particular topic was cover on the lessons plan. I found the selection of the topic being
one of the most difficult tasks of the project and most time consuming for the reason mentioned.
In the same time the research for the topic helped me being better prepared for the upcoming
lessons in the class and that was a great benefit out of this experience.
- Considering the Statistics subject, I learn that applying the statistical instrument to the real
life data is not easy and many times not possible for desired analysis approach. Many times the
population or sample structure does not meet the statistical requirements; therefore selected
statistical analysis is not applicable. In conclusion statistical tolls applicability is limited and it is
very important to ensure that the statistical requirements are met before applying it. Otherwise
the results of the test are pointless and the time invested in analysis is wasted.
- Team structure with the project manager assigned that controls the status of accomplished
work was very effective and without this structure and project manager enforcement, perhaps the
team project would not be accomplished. Setting deadline and clear team member`s assignment
performed by the project manager was definitely a crucial part of the assignment, especially
considering the fact that the whole course is offered online.

Team member 3: The project was challenging, and was a good example to learn how to apply
theory to real life situations. It improved my understanding of the coursework and definitely
added value to the course. I think it is very important to have the project in the coursework.
However, the team project was not a perfect scenario for me. Although I like to work in teams
than as an individual, the idea of taking an online course was to learn new concepts at my own
schedule. Given that the team members were from different parts of the country-in different time
zones, had different schedules and different ways to work, working in teams for an online course
with all working team members did not pan out the best. This was the biggest challenge. We did
overcome those challenges and are here so a couple of inputs would be:
1. It is not justified to have 50% of the grade for a group project.
2. If the team had 5 participants- then the group project should have been designed in a way
that it could be easily divided into 5 equal parts, with each person having a chance to add
to his learning and also work into a team.
3. An individual project is a better choice for an online course, in my opinion.

Overall, in spite of some complaints, I enjoyed working with my team and had a decent
experience with regard to the project.
Team member 4 : -It was very difficult scale and filter out all the data into isolation so that we
our team could break down the segments. Choosing the right data to compare and analyze, being
decisive and strategic to make the proper and appropriate assessments for our project.
-I feel that such a project of real life science is always useful, including future consumption,
similar study, or in real world application. The same theories can be applied to future buying
trends, knowledge of such a market, or opinion based off hypothesis examination.
-Team-work is essential. Our team celebrates our wonderful diversity of international students,
ages, genders, and locations. It is very important to properly communicate with various
mechanisms to diversify communication including text messages, email, phone calls. We had the
best team project manager, the quarterback of our team, she was easy to understand and
communicate with, written e-mails, and phone-communication completely directive, clear, well
planned with great operation management and distinct. Communication and operation are
essential in the success of our team.
Shanthi took on the role our project leader; she helped lead the organization and mission. She
made a substantial impact as our team`s project manager.
Other team members had similar opinions.













The first table is the transformed data that was used for the entire report.
O
r
i
g
i
n
T
y
p
e
N
o
.

w
i
t
h
i
n

T
y
p
e
M
a
n
u
f
a
c
t
u
r
e
r
M
o
d
e
l
P
a
s
s
e
n
g
e
r
s
P
r
i
c
e
_
C
a
t
e
g
o
r
y
P
r
i
c
e
C
i
t
y

M
P
G
C
i
t
y
M
P
G
_
C
a
t
e
g
o
r
y
P
r
i
c
e
H
i
g
h
w
a
y
M
P
G
H
i
g
h
w
a
y
M
P
G
_
C
a
t
e
g
o
r
yE
n
g
i
n
e
S
i
z
e
E
n
g
i
n
e
s
i
z
e
_
C
a
t
e
g
o
r
yH
o
r
s
e
p
o
w
e
r
H
o
r
s
e
p
o
w
e
r
_
C
a
t
e
g
o
r
y
F
u
e
l
T
a
n
k
W
e
i
g
h
t
W
e
i
g
h
t
_
C
a
t
e
g
o
r
y
U
S
S
m
a
l
l
1
F
o
r
d
F
e
s
t
i
v
a
4
C
h
e
a
p
7
.
4
3
1
G
o
o
d
7
.
4
3
3
G
o
o
d
1
.
3
S
m
a
l
l
6
3
<
1
0
0
1
0
1
8
4
5
<

1
.
5

T
o
n
n
o
n
-
U
S
S
m
a
l
l
6
M
a
z
d
a
3
2
3
4
C
h
e
a
p
8
.
3
2
9
G
o
o
d
8
.
3
3
7
G
o
o
d
1
.
6
S
m
a
l
l
8
2
<
1
0
0
1
3
.
2
2
3
2
5
<

1
.
5

T
o
n
n
o
n
-
U
S
S
m
a
l
l
2
G
e
o
M
e
t
r
o
4
C
h
e
a
p
8
.
4
4
6
G
o
o
d
8
.
4
5
0
G
o
o
d
1
S
m
a
l
l
5
5
<
1
0
0
1
0
.
6
1
6
9
5
<

1
.
5

T
o
n
n
o
n
-
U
S
S
m
a
l
l
3
S
u
z
u
k
i
S
w
i
f
t
4
C
h
e
a
p
8
.
6
3
9
G
o
o
d
8
.
6
4
3
G
o
o
d
1
.
3
S
m
a
l
l
7
0
<
1
0
0
1
0
.
6
1
9
6
5
<

1
.
5

T
o
n
U
S
S
m
a
l
l
2
P
o
n
t
i
a
c
L
e
M
a
n
s
4
C
h
e
a
p
9
3
1
G
o
o
d
9
4
1
G
o
o
d
1
.
6
S
m
a
l
l
7
4
<
1
0
0
1
3
.
2
2
3
5
0
<

1
.
5

T
o
n
n
o
n
-
U
S
S
m
a
l
l
5
V
o
l
k
s
w
a
g
e
n
F
o
x
4
C
h
e
a
p
9
.
1
2
5
G
o
o
d
9
.
1
3
3
G
o
o
d
1
.
8
S
m
a
l
l
8
1
<
1
0
0
1
2
.
4
2
2
4
0
<

1
.
5

T
o
n
U
S
S
m
a
l
l
7
D
o
d
g
e
C
o
l
t
5
C
h
e
a
p
9
.
2
2
9
G
o
o
d
9
.
2
3
3
G
o
o
d
1
.
5
S
m
a
l
l
9
2
<
1
0
0
1
3
.
2
2
2
7
0
<

1
.
5

T
o
n
n
o
n
-
U
S
S
p
o
r
t
y
2
H
y
u
n
d
a
i
S
c
o
u
p
e
4
C
h
e
a
p
1
0
2
6
G
o
o
d
1
0
3
4
G
o
o
d
1
.
5
S
m
a
l
l
9
2
<
1
0
0
1
1
.
9
2
2
8
5
<

1
.
5

T
o
n
U
S
S
m
a
l
l
5
F
o
r
d
E
s
c
o
r
t
5
C
h
e
a
p
1
0
.
1
2
3
G
o
o
d
1
0
.
1
3
0
G
o
o
d
1
.
8
S
m
a
l
l
1
2
7
1
0
0

t
o

<
1
5
0
1
3
.
2
2
5
3
0
<

1
.
5

T
o
n
n
o
n
-
U
S
S
m
a
l
l
1
2
S
u
b
a
r
u
L
o
y
a
l
e
5
C
h
e
a
p
1
0
.
9
2
5
G
o
o
d
1
0
.
9
3
0
G
o
o
d
1
.
8
S
m
a
l
l
9
0
<
1
0
0
1
5
.
9
2
4
9
0
<

1
.
5

T
o
n
U
S
C
o
m
p
a
c
t
2
P
o
n
t
i
a
c
S
u
n
b
i
r
d
5
C
h
e
a
p
1
1
.
1
2
3
G
o
o
d
1
1
.
1
3
1
G
o
o
d
2
M
e
d
i
u
m
1
1
0
1
0
0

t
o

<
1
5
0
1
5
.
2
2
5
7
5
<

1
.
5

T
o
n
U
S
S
m
a
l
l
3
S
a
t
u
r
n
S
L
5
C
h
e
a
p
1
1
.
1
2
8
G
o
o
d
1
1
.
1
3
8
G
o
o
d
1
.
9
S
m
a
l
l
8
5
<
1
0
0
1
2
.
8
2
4
9
5
<

1
.
5

T
o
n
U
S
C
o
m
p
a
c
t
4
F
o
r
d
T
e
m
p
o
5
C
h
e
a
p
1
1
.
3
2
2
G
o
o
d
1
1
.
3
2
7
P
o
o
r
2
.
3
M
e
d
i
u
m
9
6
<
1
0
0
1
5
.
9
2
6
9
0
<

1
.
5

T
o
n
U
S
S
m
a
l
l
6
D
o
d
g
e
S
h
a
d
o
w
5
C
h
e
a
p
1
1
.
3
2
3
G
o
o
d
1
1
.
3
2
9
P
o
o
r
2
.
2
M
e
d
i
u
m
9
3
<
1
0
0
1
4
2
6
7
0
<

1
.
5

T
o
n
U
S
C
o
m
p
a
c
t
3
C
h
e
v
r
o
l
e
t
C
o
r
s
i
c
a
5
C
h
e
a
p
1
1
.
4
2
5
G
o
o
d
1
1
.
4
3
4
G
o
o
d
2
.
2
M
e
d
i
u
m
1
1
0
1
0
0

t
o

<
1
5
0
1
5
.
6
2
7
8
5
<

1
.
5

T
o
n
n
o
n
-
U
S
S
m
a
l
l
8
M
a
z
d
a
P
r
o
t
e
g
e
5
C
h
e
a
p
1
1
.
6
2
8
G
o
o
d
1
1
.
6
3
6
G
o
o
d
1
.
8
S
m
a
l
l
1
0
3
1
0
0

t
o

<
1
5
0
1
4
.
5
2
4
4
0
<

1
.
5

T
o
n
n
o
n
-
U
S
S
m
a
l
l
7
N
i
s
s
a
n
S
e
n
t
r
a
5
C
h
e
a
p
1
1
.
8
2
9
G
o
o
d
1
1
.
8
3
3
G
o
o
d
1
.
6
S
m
a
l
l
1
1
0
1
0
0

t
o

<
1
5
0
1
3
.
2
2
5
4
5
<

1
.
5

T
o
n
U
S
S
m
a
l
l
4
E
a
g
l
e
S
u
m
m
i
t
5
C
h
e
a
p
1
2
.
2
2
9
G
o
o
d
1
2
.
2
3
3
G
o
o
d
1
.
5
S
m
a
l
l
9
2
<
1
0
0
1
3
.
2
2
2
9
5
<

1
.
5

T
o
n
n
o
n
-
U
S
S
p
o
r
t
y
4
G
e
o
S
t
o
r
m
4
C
h
e
a
p
1
2
.
5
3
0
G
o
o
d
1
2
.
5
3
6
G
o
o
d
1
.
6
S
m
a
l
l
9
0
<
1
0
0
1
2
.
4
2
4
7
5
<

1
.
5

T
o
n
U
S
C
o
m
p
a
c
t
7
D
o
d
g
e
S
p
i
r
i
t
6
C
h
e
a
p
1
3
.
3
2
2
G
o
o
d
1
3
.
3
2
7
P
o
o
r
2
.
5
M
e
d
i
u
m
1
0
0
1
0
0

t
o

<
1
5
0
1
6
2
9
7
0
<

1
.
5

T
o
n
U
S
C
o
m
p
a
c
t
5
C
h
e
v
r
o
l
e
t
C
a
v
a
l
i
e
r
5
C
h
e
a
p
1
3
.
4
2
5
G
o
o
d
1
3
.
4
3
6
G
o
o
d
2
.
2
M
e
d
i
u
m
1
1
0
1
0
0

t
o

<
1
5
0
1
5
.
2
2
4
9
0
<

1
.
5

T
o
n
U
S
C
o
m
p
a
c
t
1
O
l
d
s
m
o
b
i
l
e
A
c
h
i
e
v
a
5
C
h
e
a
p
1
3
.
5
2
4
G
o
o
d
1
3
.
5
3
1
G
o
o
d
2
.
3
M
e
d
i
u
m
1
5
5
1
5
0

t
o

<
2
0
0
1
5
.
2
2
9
1
0
<

1
.
5

T
o
n
n
o
n
-
U
S
M
i
d
s
i
z
e
8
H
y
u
n
d
a
i
S
o
n
a
t
a
5
C
h
e
a
p
1
3
.
9
2
0
P
o
o
r
1
3
.
9
2
7
P
o
o
r
2
M
e
d
i
u
m
1
2
8
1
0
0

t
o

<
1
5
0
1
7
.
2
2
8
8
5
<

1
.
5

T
o
n
U
S
S
p
o
r
t
y
7
P
l
y
m
o
u
t
h
L
a
s
e
r
4
C
h
e
a
p
1
4
.
4
2
3
G
o
o
d
1
4
.
4
3
0
G
o
o
d
1
.
8
S
m
a
l
l
9
2
<
1
0
0
1
5
.
9
2
6
4
0
<

1
.
5

T
o
n
U
S
M
i
d
s
i
z
e
3
M
e
r
c
u
r
y
C
o
u
g
a
r
5
C
h
e
a
p
1
4
.
9
1
9
P
o
o
r
1
4
.
9
2
6
P
o
o
r
3
.
8
L
a
r
g
e
1
4
0
1
0
0

t
o

<
1
5
0
1
8
3
6
1
0
>

1
.
5

T
o
n
U
S
S
p
o
r
t
y
8
C
h
e
v
r
o
l
e
t
C
a
m
a
r
o
4
M
o
d
e
r
a
t
e
1
5
.
1
1
9
P
o
o
r
1
5
.
1
2
8
P
o
o
r
3
.
4
L
a
r
g
e
1
6
0
1
5
0

t
o

<
2
0
0
1
5
.
5
3
2
4
0
>

1
.
5

T
o
n
U
S
M
i
d
s
i
z
e
7
D
o
d
g
e
D
y
n
a
s
t
y
6
M
o
d
e
r
a
t
e
1
5
.
6
2
1
G
o
o
d
1
5
.
6
2
7
P
o
o
r
2
.
5
M
e
d
i
u
m
1
0
0
1
0
0

t
o

<
1
5
0
1
6
3
0
8
0
>

1
.
5

T
o
n
n
o
n
-
U
S
C
o
m
p
a
c
t
2
N
i
s
s
a
n
A
l
t
i
m
a
5
M
o
d
e
r
a
t
e
1
5
.
7
2
4
G
o
o
d
1
5
.
7
3
0
G
o
o
d
2
.
4
M
e
d
i
u
m
1
5
0
1
5
0

t
o

<
2
0
0
1
5
.
9
3
0
5
0
>

1
.
5

T
o
n
U
S
M
i
d
s
i
z
e
8
B
u
i
c
k
C
e
n
t
u
r
y
6
M
o
d
e
r
a
t
e
1
5
.
7
2
2
G
o
o
d
1
5
.
7
3
1
G
o
o
d
2
.
2
M
e
d
i
u
m
1
1
0
1
0
0

t
o

<
1
5
0
1
6
.
4
2
8
8
0
<

1
.
5

T
o
n
U
S
C
o
m
p
a
c
t
6
C
h
r
y
s
l
e
r
L
e
B
a
r
o
n
6
M
o
d
e
r
a
t
e
1
5
.
8
2
3
G
o
o
d
1
5
.
8
2
8
P
o
o
r
3
L
a
r
g
e
1
4
1
1
0
0

t
o

<
1
5
0
1
6
3
0
8
5
>

1
.
5

T
o
n
U
S
M
i
d
s
i
z
e
1
0
C
h
e
v
r
o
l
e
t
L
u
m
i
n
a
6
M
o
d
e
r
a
t
e
1
5
.
9
2
1
G
o
o
d
1
5
.
9
2
9
P
o
o
r
2
.
2
M
e
d
i
u
m
1
1
0
1
0
0

t
o

<
1
5
0
1
6
.
5
3
1
9
5
>

1
.
5

T
o
n
U
S
S
p
o
r
t
y
3
F
o
r
d
M
u
s
t
a
n
g
4
M
o
d
e
r
a
t
e
1
5
.
9
2
2
G
o
o
d
1
5
.
9
2
9
P
o
o
r
2
.
3
M
e
d
i
u
m
1
0
5
1
0
0

t
o

<
1
5
0
1
5
.
4
2
8
5
0
<

1
.
5

T
o
n
U
S
M
i
d
s
i
z
e
1
O
l
d
s
m
o
b
i
l
e
C
u
t
l
a
s
s
_
C
i
5
M
o
d
e
r
a
t
e
1
6
.
3
2
3
G
o
o
d
1
6
.
3
3
1
G
o
o
d
2
.
2
M
e
d
i
u
m
1
1
0
1
0
0

t
o

<
1
5
0
1
6
.
5
2
8
9
0
<

1
.
5

T
o
n
U
S
V
a
n
4
C
h
e
v
r
o
l
e
t
L
u
m
i
n
a
_
A
P
V
7
M
o
d
e
r
a
t
e
1
6
.
3
1
8
P
o
o
r
1
6
.
3
2
3
P
o
o
r
3
.
8
L
a
r
g
e
1
7
0
1
5
0

t
o

<
2
0
0
2
0
3
7
1
5
>

1
.
5

T
o
n
n
o
n
-
U
S
C
o
m
p
a
c
t
3
M
a
z
d
a
6
2
6
5
M
o
d
e
r
a
t
e
1
6
.
5
2
6
G
o
o
d
1
6
.
5
3
4
G
o
o
d
2
.
5
M
e
d
i
u
m
1
6
4
1
5
0

t
o

<
2
0
0
1
5
.
5
2
9
7
0
<

1
.
5

T
o
n
n
o
n
-
U
S
C
o
m
p
a
c
t
1
H
o
n
d
a
A
c
c
o
r
d
4
M
o
d
e
r
a
t
e
1
7
.
5
2
4
G
o
o
d
1
7
.
5
3
1
G
o
o
d
2
.
2
M
e
d
i
u
m
1
4
0
1
0
0

t
o

<
1
5
0
1
7
3
0
4
0
>

1
.
5

T
o
n
U
S
S
p
o
r
t
y
5
P
o
n
t
i
a
c
F
i
r
e
b
i
r
d
4
M
o
d
e
r
a
t
e
1
7
.
7
1
9
P
o
o
r
1
7
.
7
2
8
P
o
o
r
3
.
4
L
a
r
g
e
1
6
0
1
5
0

t
o

<
2
0
0
1
5
.
5
3
2
4
0
>

1
.
5

T
o
n
n
o
n
-
U
S
M
i
d
s
i
z
e
6
T
o
y
o
t
a
C
a
m
r
y
5
M
o
d
e
r
a
t
e
1
8
.
2
2
2
G
o
o
d
1
8
.
2
2
9
P
o
o
r
2
.
2
M
e
d
i
u
m
1
3
0
1
0
0

t
o

<
1
5
0
1
8
.
5
3
0
3
0
>

1
.
5

T
o
n
n
o
n
-
U
S
S
p
o
r
t
y
5
T
o
y
o
t
a
C
e
l
i
c
a
4
M
o
d
e
r
a
t
e
1
8
.
4
2
5
G
o
o
d
1
8
.
4
3
2
G
o
o
d
2
.
2
M
e
d
i
u
m
1
3
5
1
0
0

t
o

<
1
5
0
1
5
.
9
2
9
5
0
<

1
.
5

T
o
n
U
S
M
i
d
s
i
z
e
5
P
o
n
t
i
a
c
G
r
a
n
d
_
P
r
i
x
5
M
o
d
e
r
a
t
e
1
8
.
5
1
9
P
o
o
r
1
8
.
5
2
7
P
o
o
r
3
.
4
L
a
r
g
e
2
0
0
>
=
2
0
0
1
6
.
5
3
4
5
0
>

1
.
5

T
o
n
U
S
V
a
n
1
D
o
d
g
e
C
a
r
a
v
a
n
7
M
o
d
e
r
a
t
e
1
9
1
7
P
o
o
r
1
9
2
1
P
o
o
r
3
L
a
r
g
e
1
4
2
1
0
0

t
o

<
1
5
0
2
0
3
7
0
5
>

1
.
5

T
o
n
n
o
n
-
U
S
V
a
n
2
M
a
z
d
a
M
P
V
7
M
o
d
e
r
a
t
e
1
9
.
1
1
8
P
o
o
r
1
9
.
1
2
4
P
o
o
r
3
L
a
r
g
e
1
5
5
1
5
0

t
o

<
2
0
0
1
9
.
6
3
7
3
5
>

1
.
5

T
o
n
n
o
n
-
U
S
V
a
n
4
N
i
s
s
a
n
Q
u
e
s
t
7
M
o
d
e
r
a
t
e
1
9
.
1
1
7
P
o
o
r
1
9
.
1
2
3
P
o
o
r
3
L
a
r
g
e
1
5
1
1
5
0

t
o

<
2
0
0
2
0
4
1
0
0
>

1
.
5

T
o
n
n
o
n
-
U
S
C
o
m
p
a
c
t
9
S
u
b
a
r
u
L
e
g
a
c
y
5
M
o
d
e
r
a
t
e
1
9
.
5
2
3
G
o
o
d
1
9
.
5
3
0
G
o
o
d
2
.
2
M
e
d
i
u
m
1
3
0
1
0
0

t
o

<
1
5
0
1
5
.
9
3
0
8
5
>

1
.
5

T
o
n
U
S
V
a
n
2
O
l
d
s
m
o
b
i
l
e
S
i
l
h
o
u
e
t
t
e
7
M
o
d
e
r
a
t
e
1
9
.
5
1
8
P
o
o
r
1
9
.
5
2
3
P
o
o
r
3
.
8
L
a
r
g
e
1
7
0
1
5
0

t
o

<
2
0
0
2
0
3
7
1
5
>

1
.
5

T
o
n
n
o
n
-
U
S
V
a
n
1
V
o
l
k
s
w
a
g
e
n
E
u
r
o
v
a
n
7
M
o
d
e
r
a
t
e
1
9
.
7
1
7
P
o
o
r
1
9
.
7
2
1
P
o
o
r
2
.
5
M
e
d
i
u
m
1
0
9
1
0
0

t
o

<
1
5
0
2
1
.
1
3
9
6
0
>

1
.
5

T
o
n
n
o
n
-
U
S
S
p
o
r
t
y
3
H
o
n
d
a
P
r
e
l
u
d
e
4
M
o
d
e
r
a
t
e
1
9
.
8
2
4
G
o
o
d
1
9
.
8
3
1
G
o
o
d
2
.
3
M
e
d
i
u
m
1
6
0
1
5
0

t
o

<
2
0
0
1
5
.
9
2
8
6
5
<

1
.
5

T
o
n
U
S
V
a
n
3
F
o
r
d
A
e
r
o
s
t
a
r
7
M
o
d
e
r
a
t
e
1
9
.
9
1
5
P
o
o
r
1
9
.
9
2
0
P
o
o
r
3
L
a
r
g
e
1
4
5
1
0
0

t
o

<
1
5
0
2
1
3
7
3
5
>

1
.
5

T
o
n
U
S
M
i
d
s
i
z
e
6
F
o
r
d
T
a
u
r
u
s
5
E
x
p
e
n
s
i
v
e
2
0
.
2
2
1
G
o
o
d
2
0
.
2
3
0
G
o
o
d
3
L
a
r
g
e
1
4
0
1
0
0

t
o

<
1
5
0
1
6
3
3
2
5
>

1
.
5

T
o
n
n
o
n
-
U
S
M
i
d
s
i
z
e
3
N
i
s
s
a
n
M
a
x
i
m
a
5
E
x
p
e
n
s
i
v
e
2
1
.
5
2
1
G
o
o
d
2
1
.
5
2
6
P
o
o
r
3
L
a
r
g
e
1
6
0
1
5
0

t
o

<
2
0
0
1
8
.
5
3
2
0
0
>

1
.
5

T
o
n
n
o
n
-
U
S
C
o
m
p
a
c
t
7
V
o
l
v
o
2
4
0
5
E
x
p
e
n
s
i
v
e
2
2
.
7
2
1
G
o
o
d
2
2
.
7
2
8
P
o
o
r
2
.
3
M
e
d
i
u
m
1
1
4
1
0
0

t
o

<
1
5
0
1
5
.
8
2
9
8
5
<

1
.
5

T
o
n
n
o
n
-
U
S
V
a
n
3
T
o
y
o
t
a
P
r
e
v
i
a
7
E
x
p
e
n
s
i
v
e
2
2
.
7
1
8
P
o
o
r
2
2
.
7
2
2
P
o
o
r
2
.
4
M
e
d
i
u
m
1
3
8
1
0
0

t
o

<
1
5
0
1
9
.
8
3
7
8
5
>

1
.
5

T
o
n
n
o
n
-
U
S
S
p
o
r
t
y
6
V
o
l
k
s
w
a
g
e
n
C
o
r
r
a
d
o
4
E
x
p
e
n
s
i
v
e
2
3
.
3
1
8
P
o
o
r
2
3
.
3
2
5
P
o
o
r
2
.
8
M
e
d
i
u
m
1
7
8
1
5
0

t
o

<
2
0
0
1
8
.
5
2
8
1
0
<

1
.
5

T
o
n
U
S
S
p
o
r
t
y
2
D
o
d
g
e
S
t
e
a
l
t
h
4
E
x
p
e
n
s
i
v
e
2
5
.
8
1
8
P
o
o
r
2
5
.
8
2
4
P
o
o
r
3
L
a
r
g
e
3
0
0
>
=
2
0
0
1
9
.
8
3
8
0
5
>

1
.
5

T
o
n
n
o
n
-
U
S
M
i
d
s
i
z
e
4
M
i
t
s
u
b
i
s
h
i
D
i
a
m
a
n
t
e
5
E
x
p
e
n
s
i
v
e
2
6
.
1
1
8
P
o
o
r
2
6
.
1
2
4
P
o
o
r
3
L
a
r
g
e
2
0
2
>
=
2
0
0
1
9
3
7
3
0
>

1
.
5

T
o
n
U
S
M
i
d
s
i
z
e
4
B
u
i
c
k
R
i
v
i
e
r
a
5
E
x
p
e
n
s
i
v
e
2
6
.
3
1
9
P
o
o
r
2
6
.
3
2
7
P
o
o
r
3
.
8
L
a
r
g
e
1
7
0
1
5
0

t
o

<
2
0
0
1
8
.
8
3
4
9
5
>

1
.
5

T
o
n
n
o
n
-
U
S
M
i
d
s
i
z
e
1
0
L
e
x
u
s
E
S
3
0
0
5
E
x
p
e
n
s
i
v
e
2
8
1
8
P
o
o
r
2
8
2
4
P
o
o
r
3
L
a
r
g
e
1
8
5
1
5
0

t
o

<
2
0
0
1
8
.
5
3
5
1
0
>

1
.
5

T
o
n
n
o
n
-
U
S
C
o
m
p
a
c
t
8
S
a
a
b
9
0
0
5
E
x
p
e
n
s
i
v
e
2
8
.
7
2
0
P
o
o
r
2
8
.
7
2
6
P
o
o
r
2
.
1
M
e
d
i
u
m
1
4
0
1
0
0

t
o

<
1
5
0
1
8
2
7
7
5
<

1
.
5

T
o
n
n
o
n
-
U
S
C
o
m
p
a
c
t
5
A
u
d
i
9
0
5
E
x
p
e
n
s
i
v
e
2
9
.
1
2
0
P
o
o
r
2
9
.
1
2
6
P
o
o
r
2
.
8
M
e
d
i
u
m
1
7
2
1
5
0

t
o

<
2
0
0
1
6
.
9
3
3
7
5
>

1
.
5

T
o
n
n
o
n
-
U
S
M
i
d
s
i
z
e
2
B
M
W
5
3
5
i
4
E
x
p
e
n
s
i
v
e
3
0
2
2
G
o
o
d
3
0
3
0
G
o
o
d
3
.
5
L
a
r
g
e
2
0
8
>
=
2
0
0
2
1
.
1
3
6
4
0
>

1
.
5

T
o
n
n
o
n
-
U
S
S
p
o
r
t
y
1
M
a
z
d
a
R
X
-
7
2
E
x
p
e
n
s
i
v
e
3
2
.
5
1
7
P
o
o
r
3
2
.
5
2
5
P
o
o
r
1
.
3
S
m
a
l
l
2
5
5
>
=
2
0
0
2
0
2
8
9
5
<

1
.
5

T
o
n
n
o
n
-
U
S
M
i
d
s
i
z
e
5
A
c
u
r
a
L
e
g
e
n
d
5
E
x
p
e
n
s
i
v
e
3
3
.
9
1
8
P
o
o
r
3
3
.
9
2
5
P
o
o
r
3
.
2
L
a
r
g
e
2
0
0
>
=
2
0
0
1
8
3
5
6
0
>

1
.
5

T
o
n
U
S
M
i
d
s
i
z
e
9
L
i
n
c
o
l
n
C
o
n
t
i
n
e
n
t
a
6
E
x
p
e
n
s
i
v
e
3
4
.
3
1
7
P
o
o
r
3
4
.
3
2
6
P
o
o
r
3
.
8
L
a
r
g
e
1
6
0
1
5
0

t
o

<
2
0
0
1
8
.
4
3
6
9
5
>

1
.
5

T
o
n
n
o
n
-
U
S
M
i
d
s
i
z
e
1
L
e
x
u
s
S
C
3
0
0
4
E
x
p
e
n
s
i
v
e
3
5
.
2
1
8
P
o
o
r
3
5
.
2
2
3
P
o
o
r
3
L
a
r
g
e
2
2
5
>
=
2
0
0
2
0
.
6
3
5
1
5
>

1
.
5

T
o
n
n
o
n
-
U
S
M
i
d
s
i
z
e
1
1
A
u
d
i
1
0
0
6
E
x
p
e
n
s
i
v
e
3
7
.
7
1
9
P
o
o
r
3
7
.
7
2
6
P
o
o
r
2
.
8
M
e
d
i
u
m
1
7
2
1
5
0

t
o

<
2
0
0
2
1
.
1
3
4
0
5
>

1
.
5

T
o
n
U
S
S
p
o
r
t
y
1
C
h
e
v
r
o
l
e
t
C
o
r
v
e
t
t
e
2
E
x
p
e
n
s
i
v
e
3
8
1
7
P
o
o
r
3
8
2
5
P
o
o
r
5
.
7
L
a
r
g
e
3
0
0
>
=
2
0
0
2
0
3
3
8
0
>

1
.
5

T
o
n
U
S
M
i
d
s
i
z
e
2
C
a
d
i
l
l
a
c
S
e
v
i
l
l
e
5
E
x
p
e
n
s
i
v
e
4
0
.
1
1
6
P
o
o
r
4
0
.
1
2
5
P
o
o
r
4
.
6
L
a
r
g
e
2
9
5
>
=
2
0
0
2
0
3
9
3
5
>

1
.
5

T
o
n
n
o
n
-
U
S
M
i
d
s
i
z
e
9
I
n
f
i
n
i
t
i
Q
4
5
5
E
x
p
e
n
s
i
v
e
4
7
.
9
1
7
P
o
o
r
4
7
.
9
2
2
P
o
o
r
4
.
5
L
a
r
g
e
2
7
8
>
=
2
0
0
2
2
.
5
4
0
0
0
>

1
.
5

T
o
n








For Question 1 Chi-Square.
Table 1 (Distribution of Non US based Cars and Types )
Origin Type
Car Type
(Numeric)
non-US Compact 1
non-US Compact 1
non-US Compact 1
non-US Compact 1
non-US Compact 1
non-US Compact 1
non-US Compact 1
non-US Compact 1
non-US Compact 1
non-US Midsize 3
non-US Midsize 3
non-US Midsize 3
non-US Midsize 3
non-US Midsize 3
non-US Midsize 3
non-US Midsize 3
non-US Midsize 3
non-US Midsize 3
non-US Midsize 3
non-US Midsize 3
non-US Small 2
non-US Small 2
non-US Small 2
non-US Small 2
non-US Small 2
non-US Small 2
non-US Small 2
non-US Small 2
non-US Small 2
non-US Small 2
non-US Small 2
non-US Small 2
non-US Small 2
non-US Small 2
non-US Sporty 4
non-US Sporty 4
non-US Sporty 4
non-US Sporty 4
non-US Sporty 4
non-US Sporty 4
non-US Van 5
non-US Van 5
non-US Van 5
non-US Van 5
Table 2 (Distribution of US based Cars and Types)
Origin Type
Car Type
(Numeric)
US Compact 1
US Compact 1
US Compact 1
US Compact 1
US Compact 1
US Compact 1
US Compact 1
US Large 6
US Large 6
US Large 6
US Large 6
US Large 6
US Large 6
US Large 6
US Large 6
US Large 6
US Large 6
US Large 6
US Midsize 3
US Midsize 3
US Midsize 3
US Midsize 3
US Midsize 3
US Midsize 3
US Midsize 3
US Midsize 3
US Midsize 3
US Midsize 3
US Small 2
US Small 2
US Small 2
US Small 2
US Small 2
US Small 2
US Small 2
US Sporty 4
US Sporty 4
US Sporty 4
US Sporty 4
US Sporty 4
US Sporty 4
US Sporty 4
US Sporty 4
US Van 5
US Van 5
US Van 5
US Van 5
US Van 5

For Question 2: Hypothesis test for two means:
Table 1 (Sample population of Non US originated cars)
Origin Size Passengers Price
non-
US Compact 4 17.5
non-
US Compact 5 15.7
non-
US Compact 5 16.5
non-
US Compact 5 29.1
non-
US Compact 5 22.7
non-
US Compact 5 28.7
non-
US Compact 5 19.5
non-
US Midsize 4 35.2
non-
US Midsize 4 30
non-
US Midsize 5 21.5
non-
US Midsize 5 26.1
non-
US Midsize 5 33.9
non-
US Midsize 5 18.2
non-
US Midsize 5 13.9
non-
US Midsize 5 47.9
non-
US Midsize 5 28
non-
US Midsize 6 37.7
non-
US Small 4 8.4
non-
US Small 4 8.6
non-
US Small 4 9.1
non-
US Small 4 8.3
non-
US Small 5 11.8
non-
US Small 5 11.6
non-
US Small 5 10.9
non-
US Sporty 2 32.5
non-
US Sporty 4 10
non-
US Sporty 4 19.8
non-
US Sporty 4 12.5
non-
US Sporty 4 18.4
non-
US Sporty 4 23.3
non-
US Van 7 19.7
non-
US Van 7 19.1
non-
US Van 7 22.7
non-
US Van 7 19.1


Table 2 (Sample population of US originated cars)
Origin Size Passengers Price
US Compact 5 13.5
US Compact 5 11.1
US Compact 5 11.4
US Compact 5 11.3
US Compact 5 13.4
US Compact 6 15.8
US Compact 6 13.3
US Midsize 5 16.3
US Midsize 5 40.1
US Midsize 5 14.9
US Midsize 5 26.3
US Midsize 5 18.5
US Midsize 5 20.2
US Midsize 6 15.6
US Midsize 6 15.7
US Midsize 6 34.3
US Midsize 6 15.9
US Small 4 7.4
US Small 4 9
US Small 5 11.1
US Small 5 12.2
US Small 5 10.1
US Small 5 11.3
US Small 5 9.2
US Sporty 2 38
US Sporty 4 25.8
US Sporty 4 15.9
US Sporty 4 17.7
US Sporty 4 14.4
US Sporty 4 15.1
US Van 7 19
US Van 7 19.5
US Van 7 19.9
US Van 7 16.3




For Question 3: Regression Analysis
RESIDUAL OUTPUT-City MPG



Observation Predicted Y Residuals
1 17.25863325 0.241366754
2 17.25863325 -1.558633246
3 15.15098622 1.349013777
4 21.47392729 7.626072707
5 20.42010378 2.279896219
6 21.47392729 7.226072707
7 18.31245676 1.187543242
8 23.58157432 11.61842568
9 19.36628027 10.63371973
10 20.42010378 1.079896219
11 23.58157432 2.518425684
12 23.58157432 10.31842568
13 19.36628027 -1.166280269
14 21.47392729 -7.573927293
15 24.63539783 23.26460217
16 23.58157432 4.418425684
17 22.5277508 15.1722492
18 -5.925484008 14.32548401
19 1.451280573 7.148719427
20 16.20480973 -7.104809735
21 11.98951569 -3.689515689
22 11.98951569 -0.189515689
23 13.0433392 -1.4433392
24 16.20480973 -5.304809735
25 24.63539783 7.864602173
26 15.15098622 -5.150986223
27 17.25863325 2.541366754
28 10.93569218 1.564307823
29 16.20480973 2.195190265
30 23.58157432 -0.281574316
31 24.63539783 -4.935397827
32 23.58157432 -4.481574316
33 23.58157432 -0.881574316
34 24.63539783 -5.535397827
35 17.25863325 -3.758633246
36 18.31245676 -7.212456758
37 16.20480973 -4.804809735
38 19.36628027 -8.066280269
39 16.20480973 -2.804809735
40 18.31245676 -2.512456758
41 19.36628027 -6.066280269
42 18.31245676 -2.012456758
43 25.68922134 14.41077866
44 22.5277508 -7.627750804
45 22.5277508 3.772249196
46 22.5277508 -4.027750804
47 20.42010378 -0.220103781
48 20.42010378 -4.820103781
49 19.36628027 -3.666280269
50 24.63539783 9.664602173
51 20.42010378 -4.520103781
52 9.881868665 -2.481868665
53 9.881868665 -0.881868665
54 13.0433392 -1.9433392
55 11.98951569 0.210484311
56 18.31245676 -8.212456758
57 18.31245676 -7.012456758
58 11.98951569 -2.789515689
59 24.63539783 13.36460217
60 23.58157432 2.218425684
61 19.36628027 -3.466280269
62 22.5277508 -4.827750804
63 18.31245676 -3.912456758
64 22.5277508 -7.427750804
65 24.63539783 -5.635397827
66 23.58157432 -4.081574316
67 26.74304485 -6.84304485
68 23.58157432 -7.281574316

RESIDUAL OUTPUT- Highway MPG



Observation Predicted Y Residuals
1 17.0197627 0.480237
2 18.01255764 -2.31256
3 14.0413779 2.458622
4 21.98373737 7.116263
5 19.9981475 2.701852
6 21.98373737 6.716263
7 18.01255764 1.487442
8 24.96212217 10.23788
9 18.01255764 11.98744
10 21.98373737 -0.48374
11 23.96932724 2.130673
12 22.9765323 10.92347
13 19.00535257 -0.80535
14 20.99094244 -7.09094
15 25.9549171 21.94508
16 23.96932724 4.030673
17 21.98373737 15.71626
18 -1.84334103 10.24334
19 5.106223503 3.493776
20 15.03417284 -5.93417
21 11.0629931 -2.76299
22 15.03417284 -3.23417
23 12.05578804 -0.45579
24 18.01255764 -7.11256
25 22.9765323 9.523468
26 14.0413779 -4.04138
27 17.0197627 2.780237
28 12.05578804 0.444212
29 16.02696777 2.373032
30 22.9765323 0.323468
31 26.94771203 -7.24771
32 23.96932724 -4.86933
33 25.9549171 -3.25492
34 24.96212217 -5.86212
35 17.0197627 -3.51976
36 17.0197627 -5.91976
37 14.0413779 -2.64138
38 20.99094244 -9.69094
39 12.05578804 1.344212
40 19.9981475 -4.19815
41 20.99094244 -7.69094
42 17.0197627 -0.71976
43 22.9765323 17.12347
44 21.98373737 -7.08374
45 20.99094244 5.309058
46 20.99094244 -2.49094
47 18.01255764 2.187442
48 20.99094244 -5.39094
49 17.0197627 -1.31976
50 21.98373737 12.31626
51 19.00535257 -3.10535
52 15.03417284 -7.63417
53 7.091813369 1.908187
54 10.07019817 1.029802
55 15.03417284 -2.83417
56 18.01255764 -7.91256
57 19.00535257 -7.70535
58 15.03417284 -5.83417
59 22.9765323 15.02347
60 23.96932724 1.830673
61 19.00535257 -3.10535
62 19.9981475 -2.29815
63 18.01255764 -3.61256
64 19.9981475 -4.89815
65 26.94771203 -7.94771
66 24.96212217 -5.46212
67 27.94050697 -8.04051
68 24.96212217 -8.66212

You might also like