AC09-Tut2-ANOVA Report - 2

HANOI UNIVERSITY
FACULTY OF MANAGEMENT AND TOURISM

STATISTICS FOR ECONOMICS
ANOVA Case Study

THE UNEMPLOYMENT RATE OF URBAN AREA
IN EIGHT REGIONS IN VIETNAM
Tutor: Ms. Lê Thị Ngọc Tú

Tutorial 2 – AC09
Group members:
Lê Thu Hằng ID: 0904010026
Lê Thị Thanh Tâm ID: 0904010093
Bùi Thị Thanh Hà ID: 0904010018
Lưu Hồng Hạnh ID: 0904010024
Lê Phương Quyên ID: 0904010090
Nguyễn Thị Hà Giang ID: 0804010015
Phan Hoàng Minh Thúy ID: 0804040107
ANOVA Case Study
CONTENTS
The Red River Delta and Central Highlands regions
Page 1
ANOVA Case Study
1.INTRODUCTION
Losing a job can be the most upsetting event in an individual’s life. A job loss means a lower
living standard in the present, anxiety about the future, and reduced self-esteem. In addition,
unemployment affects not only an individual but also the economy as a whole. At this
moment, with the development of the country, the unemployment sector has being focused
and be the priority for growth of Vietnam’s economy.
The unemployment rate is one of the economic indicators used in determining the general
state of the economy and its potential for growth. Basing on the unemployment rate,
economists determine the state of the economic climate and analyze where the country is
going in terms of jobs and outlook. The unemployment rate gives job seekers an idea of how
competitive the job market is and decides their appropriate course of action. Depending on
the unemployment figures, the government may step in, offering federal assistance to
jumpstart the economy.
The unemployment rate is affected by many factors like geographic, education, custom and
so on. Therefore, in the concern of unemployment situation in Vietnam, we decided to make
a project to check whether the unemployment rate differs from regions to regions in Vietnam
and determine the effect of demographic factor on the differences in the growth of these
regions.
To answer this question, analysis of variance (ANOVA) was chosen as the statistic method.
Particularly, we test the equality of annual unemployment rate in urban area in the eight
regions: Northwest (Tay Bac), Northeast (Dong Bac), Red River Delta (Dong Bang Song
Hong), North Central Coast (Bac Trung Bo), South Central Coast (Nam Trung Bo), Central
Highlands (Tay Nguyen), Southeast (Dong Nam Bo), Mekong River Delta (Dong Bang Song
Cuu Long). From the result of this analysis, we can help give some recommendation for
government in solving the unemployment issue, one critical economic problem.
After analyzing and testing the data collected, we found that there is sufficient statistical
evidence to believe that the differences in the unemployment rate between these areas exist.
Specifically, as the value of test statistic is large enough (F = 6.05) in comparison with the
critical value (F0.05, 7, 64 = 2.16), we reject Ho.
Page 1
ANOVA Case Study
2. RESEARCH METHODOLOGY
2.1 POPULATIONS
For this analysis, the experimental units are the years we recorded the unemployment rate.
The response variable is the annual rate of unemployment in urban area for each region. In
term of factor that defines the populations, we classify the populations by the regions in our
country. As the result, there are eight populations corresponding to the eight regions:
Northwest, Northeast, Red River Delta, North Central Coast, South Central Coast, Central
Highlands, Southeast and Mekong River Delta.
2.2 SAMPLE SIZE

As we use the single-factor analysis of variance to compare the means of eight populations,
we draw out the sample of 9 years between 2000 and 2008 from each population.
2.3 DATA COLLECTION

We collected data about the unemployment rate of labor force in urban areas by regions from
2000 to 2008 by downloading the file available on the official website of Vietnam Ministry
of Labor – invalids and social affairs. Here is the table with treatments and responses:
Table 1: Annual unemployment rate in eight regions (%)

Red North South Mekong
Central
River Northeast Northwest Central Central Southeast River
Highlands
Delta Coast Coast Delta
2000 7.34 6.49 6.02 6.87 6.31 5.16 6.16 6.15
2001 7.07 6.73 5.62 6.72 6.16 5.55 5.92 6.08
2002 6.64 6.10 5.11 5.82 5.5 4.9 6.3 5.5
2003 6.38 5.93 5.19 5.45 5.46 4.39 6.08 5.26
2004 6.03 5.41 5.41 5.56 5.56 4.53 5.92 5.03
2005 5.61 5.07 5.07 5.20 5.2 4.23 5.62 4.87
2006 6.42 4.18 4.18 5.50 5.5 2.38 5.47 4.52
2007 5.74 3.85 3.85 4.95 4.95 2.11 4.83 4.03
2008 5.35 4.17 4.17 4.77 4.77 2.51 4.89 4.12
Page 1
ANOVA Case Study
2.4 DATA PROCESSING

After the data was collected, we used Microsoft Excel to process these figures. Firstly, we
produce the histograms to check the normality requirement by analysis data tools – histogram
provided by Microsoft Excel. Then we use F-test to know whether or not exist the difference
between all populations’ variance. When all required condition are accepted, we use
ANOVA: single factor command to conduct the analysis of variances and result in the
ANOVA table as presented later in the ANOVA and discussion section.
2.5 SIGNIFICANCE LEVEL OF TEST

➢ α = 0.05
Since level of significance is the probability of making type I error, we decide to choose the
common significant level: α = 0.05 for ANOVA and all supporting tests in this case study,
which means 5 in 100 chances of making Type I error. This level of significance can ensure
the accuracy of the ANOVA and other tests.
1.ANOVA AND DISCUSSION
3.1 ASSUMPTIONS FOR ONE-WAY ANOVA

One-way ANOVA using F-test is the method applied for quantitative data of independent
populations and its purpose is to compare two or more population means. This method
requires the following assumptions:
➢ The populations are normally distribution

➢ The population variances are equal
When these conditions are matched, the F-test can produce the result with highest accuracy.
3.2 CHECKING REQUIRED CONDITIONS

Because we use eights geographic regions to define eight populations, these populations are
independent.
Check the normality of the populations
We have eight populations resembling eight regions in Vietnam. For each population, we
check the normality by producing the histogram for each sample draw out from that
population. As these eight histograms are not extremely non-normal, we assume that the eight
populations are normally distributed.
Page 1
ANOVA Case Study
Check the equality of the population variances
To save time, we test the difference between the lowest and highest sample variances. If they
are indifferent, we can believe that the variances of eight populations are equal.
Table 2: Summary of eight samples
Groups Count Sum Average Variance

Red River Delta 9 56,58 6,286666667 0,4468
Northeast 9 47,93 5,325555556 1,148852778
Northwest 9 44,62 4,957777778 0,539219444
North Central Coast 9 50,84 5,648888889 0,524961111
South Central Coast 9 49,41 5,49 0,252675
Central Highlands 9 35,76 3,973333333 1,681775
Southeast 9 51,19 5,687777778 0,285469444
Mekong River Delta 9 45,56 5,062222222 0,591894444
From summary table, we can find that South Central Coast has the lowest sample variance
and Central Highlands has the highest sample variance, 0.25 and 1.68 respectively. After
using F-test for population variances, we realize that there is sufficient statistical evidence to
conclude at 5% level of significance that those population variances of those two treatments
are unequal. Because of the result of the F-test for population variances, we cannot confirm
that the population variances of all populations are equal. However, in order to conduct and
make the ANOVA case study simple, we assume that the population variances are equal and
the limitation of this assumption will be mentioned later in the evaluation section.
3.3 F-TEST PROCEDURE

Step 1: The alternative and null hypothesis
Ho: All population means are equal
Ha: At least two means differ
Step 2: Test statistic
As the requires conditions are satisfied, we use F-statistic:
Page 1
ANOVA Case Study
F=MSTMSE
Which follow an F-distribution with ν1=k-1 and ν2=n-k
Step 3: Level of significance
α=0.05
Step 4: Decision rule
Critical value: Fα,ν1,ν2=F.05,7,64≈2.16
Rejection region: reject Ho if F>2.16
Step 5: Value of the statistic
Table 2: The ANOVA table
Source of Variation SS d.f MS F P-value F crit

Between Groups 28.943 7 4.135 6.045 1.97345E-05 2.156
Within Groups 43.773 64 0.684
Total 72.717 71
Using Excel, we obtain the value of test statistic is F=6.045
Step 6: Conclusion
As F=6.045>2.16, we reject Ho at α=0.05
Therefore, there is enough statistical evidence to infer at 5% level of significance that the
difference among the annual unemployment rate in urban area of eight regions exists.
3.4 FURTHER DISCUSSION ON THE RESULT OF ANOVA

As it shown in the F-test of ANOVA, we can believe at 5% level of significant that there are
at least two regions in Vietnam have different annual unemployment rate in urban area. To
make the result of ANOVA more convincing, we randomly test the difference of annual
unemployment rate in urban area between two regions in Vietnam: Red River Delta and
Central Highlands.
Because the objective problem is to compare two population means, the two populations are
assumed to be normally distributed based on the histograms of their samples, and the
Page 2
ANOVA Case Study
population variances are believed to be equal; we conduct t-test for population means. The
detailed procedure of this test is described in the appendix. The result of this test shows that
we have enough statistical evidence to infer at 5% level of significance that these two regions
have difference unemployment rate. This test makes the result of ANOVA more believable.
1. EVALUATION
4.1 LIMITATIONS
In the entire process of ANOVA testing, we to admit that there exist some limitations
regarding to our collected data and the assumptions.
Firstly, regarding to the conditions required for applying ANOVA hypothesis testing
procedure, we had to assume that variances of eight populations are equal and the populations
are normally distributed. In fact, the f-test for population variances represents that the
populations variances, varying in the interval from 0.252675 to 1.681775, may not be equal.
In addition, the normality requirement is checked solely based on the histograms, so the
assumption of the normality may not 100% correct. These limitations would affect the result
of our test, making it not 100% reliable.
Secondly, our topic only refers to one factor affecting the unemployment rates in the eight
main regions of Vietnam – geographic factor, although there are several factors affecting this
rate such as government policies, the quality of the workforce and the types of
unemployment. Therefore, we can only explain the differences based on the geographic
differences, which is not enough to concern the problem of unemployment.
The last main limitations regards to the data picked out in our research. Because of the
limitation of the source of data, the figures are not recently data. They are the annual
unemployment rate measured in the period from 2000 to 2008. Another problem is that we
just took the information of urban area of eight regions, so that the result is not enough to
provide a general view of unemployment issue.
4.2 IMPLICATION
In spite of some drawbacks mentioned above, this study offers some insight into the
difference in the unemployment rate of different regions in Vietnam. In more detail, the
results of this research show us the importance of geographical factor to the unemployment
rate in particular and in the labor market, the whole economy in general. Moreover, this may
Page 2
ANOVA Case Study
also suggest further studies into this issue, preferably with larger scale and more
sophistication.
2.CONCLUSION AND RECOMMENDATION
5.1 SUMMARY OF FINDINGS AND INTERPRETATIONS

After doing the study, we realize that at least the difference between unemployment rates
exists in two regions among eight of them in Vietnam and specifically, this is proved by the
hypothesis testing procedure given between the Red River Delta and Central Highlands.
5.2 RECOMMENDATIONS
With the figures picked out in this research, it is not deniable that the average unemployment
rate in Vietnam is quite high in the period from 2000 to 2008. Unemployment negatively
affects the living standard of people who do not have any work to do and influences the
economy as well as the whole society. As geography is one major factor affecting the
unemployment rate, more researches should be conducted to explain the effect of geographic
factor on unemployment rate. Government should consider the results of these researches to
find relevant solutions on the unemployment problem.
In addition, we recommend that follow-up studies should subject other affecting factors. They
should also take on a more sophisticated testing method to make sure that the unemployment
rate will be understood more clearly.
Lastly, it is highly advisable that subsequent research attempt larger population with the up-
to-date figures will help enhance the meaningfulness of the test.
REFERENCES
• Selvanathan. A, Selvanatan. S, Keller. G, Warrack. B, 2004, Australian Business Statistic, 3rd
edition, Thomson, Australia
• Levine. D.M, Stephan. D.F, Krehbiel. T.C, Berenson. M.L, 2004, Statistic for Managers, 5th
edition, Pearson, USA
• General Statistic Office, ‘The unemployment rate of labor force in urban area by regions
(1996-2008), downloaded March 18th 2011 at
Page 2
ANOVA Case Study
http://www.molisa.gov.vn/docs/SLTK/DetailSLTK/tabid/215/DocID/4686/TabModuleSetting
sId/496/language/vi-VN/Default.aspx
Page 2
ANOVA Case Study
APPENDIX
1. HISTOGRAMS
2. F-TEST FOR THE POPULATION VARIANCES

THE SOUTH CENTRAL COAST AND CENTRAL HIGHLANDS
The Central Highlands regions is denoted as population 1
The South Central Coast region is denoted as population 2
Ho: σ12σ22=1
Ha: σ12σ22≠1
Step 2: The test statistic
As we want to compare the two population variances and the populations are independently
sampled, we use F-statistic:
F=s12s22
Which follow an F-distribution with ν1=n1-1 and ν2=n2-1
α=0.05
Critical value: Fα2,ν1,ν2=F.025,8,8≈4.43
F1-α2,ν1,ν2=F.975,8,8≈0.23
Reject Ho if F > 4.43 or F < 0.23
Step 5: Value of the test statistic
Page 1
ANOVA Case Study
s12≈1.68
s22≈0.25
F=s12s22=1.680.25=6.72
Step 6: Conclusion
Since F=6.72>4.43 , we reject Ho
There is enough evidence to conclude at 5% level of significance that the variances of these
two populations differ.
THE RED RIVER DELTA AND CENTRAL HIGHLANDS
The central highlands regions is denoted as population 1
The red river delta region is denoted as population 2
Ho: σ12σ22=1
Ha: σ12σ22≠1
Step 2: The test statistic
As we want to compare the two population variances and the populations are independently
sampled, we use F-statistic:
F=s12s22
Which follow an F-distribution with ν1=n1-1 and ν2=n2-1
α=0.05
Critical value: Fα2,ν1,ν2=F.025,8,8≈4.43
F1-α2,ν1,ν2=F.975,8,8≈0.23
Reject Ho if F > 4.43 or F <0.23
Page 2
ANOVA Case Study
s12≈1.68
s22≈0.45
F=s12s22=1.680.45=3.73
Step 6: Conclusion
Since 0.23<F=3.73<4.43 , we do not reject Ho
There is not enough evidence to conclude at 5% level of significance that the variances of
these two populations differ.
3. T-TEST FOR POPULATION MEANS

THE RED RIVER DELTA AND CENTRAL HIGHLANDS REGIONS
The Central Highlands region is denoted as population 1
The Red River Delta region is denoted as population 2
Ho: μ1-μ2=0
Ha: μ1-μ2≠0
Step 2: Test statistic
Because the objective is to compare two population means, two populations are
independently sampled and normally distributed and the variances are unknown but assumed
to be equal (as tested above), we use t-statistic for equal variances populations:
t=x1-x2-(μ1-μ2)sp21n1+1n1
With sp2=n1-1s12+n2-1s22n1+n2-2 , df=n1+n2-2
α=0.05
Page 1
ANOVA Case Study
Critical value: tα2,df=t.025,16=2.12
Reject Ho if t>2.12
Using the computer, we have result table:
Table 3: The result of t-test
Central Red
Highlands River
Delta
Mean 3.973333333 6.286666667
Variance 1.681775 0.4468
Observations 9 9
Pooled Variance 1.0642875
Hypothesized Mean Difference 0
D.f 16
t Stat -4.7568011
P(T<=t) one-tail 0.000107211
t Critical one-tail 1.745883669
P(T<=t) two-tail 0.000214423
t Critical two-tail 2.119905285
From the table, we can find that t≈4.76
Step 6: Conclusion
As t≈4.76>2.12 , we reject Ho
Therefore, there is enough statistical evidence to infer at 5% level of significance that the
unemployment rate in the Red River Delta and Central Highlands region are different.
Page 1
ANOVA Case Study
Page 3

AC09-Tut2-ANOVA Report - 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AC09-Tut2-ANOVA Report - 2

Uploaded by

Copyright:

Available Formats

HANOI UNIVERSITY

FACULTY OF MANAGEMENT AND TOURISM

ANOVA Case Study

IN EIGHT REGIONS IN VIETNAM

Tutor: Ms. Lê Thị Ngọc Tú

The Red River Delta and Central Highlands regions

2.2 SAMPLE SIZE

2.3 DATA COLLECTION

Table 1: Annual unemployment rate in eight regions (%)

2.4 DATA PROCESSING

2.5 SIGNIFICANCE LEVEL OF TEST

1.ANOVA AND DISCUSSION

3.1 ASSUMPTIONS FOR ONE-WAY ANOVA

➢ The populations are normally distribution

3.2 CHECKING REQUIRED CONDITIONS

Check the normality of the populations

Check the equality of the population variances

Table 2: Summary of eight samples

Groups Count Sum Average Variance

3.3 F-TEST PROCEDURE

Ho: All population means are equal

Ha: At least two means differ

Step 2: Test statistic

As the requires conditions are satisfied, we use F-statistic:

Which follow an F-distribution with ν1=k-1 and ν2=n-k

Step 3: Level of significance

Step 4: Decision rule

Critical value: Fα,ν1,ν2=F.05,7,64≈2.16

Rejection region: reject Ho if F>2.16

Step 5: Value of the statistic

Table 2: The ANOVA table

Source of Variation SS d.f MS F P-value F crit

Using Excel, we obtain the value of test statistic is F=6.045

As F=6.045>2.16, we reject Ho at α=0.05

3.4 FURTHER DISCUSSION ON THE RESULT OF ANOVA

2.CONCLUSION AND RECOMMENDATION

5.1 SUMMARY OF FINDINGS AND INTERPRETATIONS

2. F-TEST FOR THE POPULATION VARIANCES

The Central Highlands regions is denoted as population 1

The South Central Coast region is denoted as population 2

Step 1: The alternative and null hypothesis

Step 2: The test statistic

Which follow an F-distribution with ν1=n1-1 and ν2=n2-1

Step 3: Level of significance

Step 4: Decision rule

Critical value: Fα2,ν1,ν2=F.025,8,8≈4.43

Reject Ho if F > 4.43 or F < 0.23

Step 5: Value of the test statistic

Since F=6.72>4.43 , we reject Ho

THE RED RIVER DELTA AND CENTRAL HIGHLANDS

The central highlands regions is denoted as population 1

The red river delta region is denoted as population 2

Step 1: The alternative and null hypothesis

Step 2: The test statistic

Which follow an F-distribution with ν1=n1-1 and ν2=n2-1

Step 3: Level of significance

Step 4: Decision rule

Critical value: Fα2,ν1,ν2=F.025,8,8≈4.43

Reject Ho if F > 4.43 or F <0.23

Step 5: Value of the test statistic

Since 0.23<F=3.73<4.43 , we do not reject Ho

3. T-TEST FOR POPULATION MEANS