You are on page 1of 15

HANOI UNIVERSITY

FACULTY OF MANAGEMENT AND TOURISM


STATISTICS FOR ECONOMICS

ANOVA Case Study


THE UNEMPLOYMENT RATE OF URBAN AREA

IN EIGHT REGIONS IN VIETNAM

Tutor: Ms. Lê Thị Ngọc Tú


Tutorial 2 – AC09
Group members:
Lê Thu Hằng ID: 0904010026
Lê Thị Thanh Tâm ID: 0904010093
Bùi Thị Thanh Hà ID: 0904010018
Lưu Hồng Hạnh ID: 0904010024
Lê Phương Quyên ID: 0904010090
Nguyễn Thị Hà Giang ID: 0804010015
Phan Hoàng Minh Thúy ID: 0804040107
ANOVA Case Study

CONTENTS

The Red River Delta and Central Highlands regions

Page 1
ANOVA Case Study

1.INTRODUCTION
Losing a job can be the most upsetting event in an individual’s life. A job loss means a lower
living standard in the present, anxiety about the future, and reduced self-esteem. In addition,
unemployment affects not only an individual but also the economy as a whole. At this
moment, with the development of the country, the unemployment sector has being focused
and be the priority for growth of Vietnam’s economy.

The unemployment rate is one of the economic indicators used in determining the general
state of the economy and its potential for growth. Basing on the unemployment rate,
economists determine the state of the economic climate and analyze where the country is
going in terms of jobs and outlook. The unemployment rate gives job seekers an idea of how
competitive the job market is and decides their appropriate course of action. Depending on
the unemployment figures, the government may step in, offering federal assistance to
jumpstart the economy.

The unemployment rate is affected by many factors like geographic, education, custom and
so on. Therefore, in the concern of unemployment situation in Vietnam, we decided to make
a project to check whether the unemployment rate differs from regions to regions in Vietnam
and determine the effect of demographic factor on the differences in the growth of these
regions.

To answer this question, analysis of variance (ANOVA) was chosen as the statistic method.
Particularly, we test the equality of annual unemployment rate in urban area in the eight
regions: Northwest (Tay Bac), Northeast (Dong Bac), Red River Delta (Dong Bang Song
Hong), North Central Coast (Bac Trung Bo), South Central Coast (Nam Trung Bo), Central
Highlands (Tay Nguyen), Southeast (Dong Nam Bo), Mekong River Delta (Dong Bang Song
Cuu Long). From the result of this analysis, we can help give some recommendation for
government in solving the unemployment issue, one critical economic problem.

After analyzing and testing the data collected, we found that there is sufficient statistical
evidence to believe that the differences in the unemployment rate between these areas exist.
Specifically, as the value of test statistic is large enough (F = 6.05) in comparison with the
critical value (F0.05, 7, 64 = 2.16), we reject Ho.

Page 1
ANOVA Case Study

2. RESEARCH METHODOLOGY

2.1 POPULATIONS
For this analysis, the experimental units are the years we recorded the unemployment rate.
The response variable is the annual rate of unemployment in urban area for each region. In
term of factor that defines the populations, we classify the populations by the regions in our
country. As the result, there are eight populations corresponding to the eight regions:
Northwest, Northeast, Red River Delta, North Central Coast, South Central Coast, Central
Highlands, Southeast and Mekong River Delta.

2.2 SAMPLE SIZE


As we use the single-factor analysis of variance to compare the means of eight populations,
we draw out the sample of 9 years between 2000 and 2008 from each population.

2.3 DATA COLLECTION


We collected data about the unemployment rate of labor force in urban areas by regions from
2000 to 2008 by downloading the file available on the official website of Vietnam Ministry
of Labor – invalids and social affairs. Here is the table with treatments and responses:

Table 1: Annual unemployment rate in eight regions (%)


Red North South Mekong
Central
River Northeast Northwest Central Central Southeast River
Highlands
Delta Coast Coast Delta
2000 7.34 6.49 6.02 6.87 6.31 5.16 6.16 6.15
2001 7.07 6.73 5.62 6.72 6.16 5.55 5.92 6.08
2002 6.64 6.10 5.11 5.82 5.5 4.9 6.3 5.5
2003 6.38 5.93 5.19 5.45 5.46 4.39 6.08 5.26
2004 6.03 5.41 5.41 5.56 5.56 4.53 5.92 5.03
2005 5.61 5.07 5.07 5.20 5.2 4.23 5.62 4.87
2006 6.42 4.18 4.18 5.50 5.5 2.38 5.47 4.52
2007 5.74 3.85 3.85 4.95 4.95 2.11 4.83 4.03
2008 5.35 4.17 4.17 4.77 4.77 2.51 4.89 4.12

Page 1
ANOVA Case Study

2.4 DATA PROCESSING


After the data was collected, we used Microsoft Excel to process these figures. Firstly, we
produce the histograms to check the normality requirement by analysis data tools – histogram
provided by Microsoft Excel. Then we use F-test to know whether or not exist the difference
between all populations’ variance. When all required condition are accepted, we use
ANOVA: single factor command to conduct the analysis of variances and result in the
ANOVA table as presented later in the ANOVA and discussion section.

2.5 SIGNIFICANCE LEVEL OF TEST


➢ α = 0.05

Since level of significance is the probability of making type I error, we decide to choose the
common significant level: α = 0.05 for ANOVA and all supporting tests in this case study,
which means 5 in 100 chances of making Type I error. This level of significance can ensure
the accuracy of the ANOVA and other tests.

1.ANOVA AND DISCUSSION

3.1 ASSUMPTIONS FOR ONE-WAY ANOVA


One-way ANOVA using F-test is the method applied for quantitative data of independent
populations and its purpose is to compare two or more population means. This method
requires the following assumptions:

➢ The populations are normally distribution


➢ The population variances are equal
When these conditions are matched, the F-test can produce the result with highest accuracy.

3.2 CHECKING REQUIRED CONDITIONS


Because we use eights geographic regions to define eight populations, these populations are
independent.

Check the normality of the populations

We have eight populations resembling eight regions in Vietnam. For each population, we
check the normality by producing the histogram for each sample draw out from that
population. As these eight histograms are not extremely non-normal, we assume that the eight
populations are normally distributed.

Page 1
ANOVA Case Study

Check the equality of the population variances

To save time, we test the difference between the lowest and highest sample variances. If they
are indifferent, we can believe that the variances of eight populations are equal.

Table 2: Summary of eight samples

Groups Count Sum Average Variance


Red River Delta 9 56,58 6,286666667 0,4468
Northeast 9 47,93 5,325555556 1,148852778
Northwest 9 44,62 4,957777778 0,539219444
North Central Coast 9 50,84 5,648888889 0,524961111
South Central Coast 9 49,41 5,49 0,252675
Central Highlands 9 35,76 3,973333333 1,681775
Southeast 9 51,19 5,687777778 0,285469444
Mekong River Delta 9 45,56 5,062222222 0,591894444
From summary table, we can find that South Central Coast has the lowest sample variance
and Central Highlands has the highest sample variance, 0.25 and 1.68 respectively. After
using F-test for population variances, we realize that there is sufficient statistical evidence to
conclude at 5% level of significance that those population variances of those two treatments
are unequal. Because of the result of the F-test for population variances, we cannot confirm
that the population variances of all populations are equal. However, in order to conduct and
make the ANOVA case study simple, we assume that the population variances are equal and
the limitation of this assumption will be mentioned later in the evaluation section.

3.3 F-TEST PROCEDURE


Step 1: The alternative and null hypothesis

Ho: All population means are equal

Ha: At least two means differ

Step 2: Test statistic

As the requires conditions are satisfied, we use F-statistic:

Page 1
ANOVA Case Study
F=MSTMSE

Which follow an F-distribution with ν1=k-1 and ν2=n-k

Step 3: Level of significance

α=0.05

Step 4: Decision rule

Critical value: Fα,ν1,ν2=F.05,7,64≈2.16

Rejection region: reject Ho if F>2.16

Step 5: Value of the statistic

Table 2: The ANOVA table

Source of Variation SS d.f MS F P-value F crit


Between Groups 28.943 7 4.135 6.045 1.97345E-05 2.156
Within Groups 43.773 64 0.684

Total 72.717 71

Using Excel, we obtain the value of test statistic is F=6.045

Step 6: Conclusion

As F=6.045>2.16, we reject Ho at α=0.05

Therefore, there is enough statistical evidence to infer at 5% level of significance that the
difference among the annual unemployment rate in urban area of eight regions exists.

3.4 FURTHER DISCUSSION ON THE RESULT OF ANOVA


As it shown in the F-test of ANOVA, we can believe at 5% level of significant that there are
at least two regions in Vietnam have different annual unemployment rate in urban area. To
make the result of ANOVA more convincing, we randomly test the difference of annual
unemployment rate in urban area between two regions in Vietnam: Red River Delta and
Central Highlands.

Because the objective problem is to compare two population means, the two populations are
assumed to be normally distributed based on the histograms of their samples, and the

Page 2
ANOVA Case Study

population variances are believed to be equal; we conduct t-test for population means. The
detailed procedure of this test is described in the appendix. The result of this test shows that
we have enough statistical evidence to infer at 5% level of significance that these two regions
have difference unemployment rate. This test makes the result of ANOVA more believable.

1. EVALUATION

4.1 LIMITATIONS
In the entire process of ANOVA testing, we to admit that there exist some limitations
regarding to our collected data and the assumptions.

Firstly, regarding to the conditions required for applying ANOVA hypothesis testing
procedure, we had to assume that variances of eight populations are equal and the populations
are normally distributed. In fact, the f-test for population variances represents that the
populations variances, varying in the interval from 0.252675 to 1.681775, may not be equal.
In addition, the normality requirement is checked solely based on the histograms, so the
assumption of the normality may not 100% correct. These limitations would affect the result
of our test, making it not 100% reliable.

Secondly, our topic only refers to one factor affecting the unemployment rates in the eight
main regions of Vietnam – geographic factor, although there are several factors affecting this
rate such as government policies, the quality of the workforce and the types of
unemployment. Therefore, we can only explain the differences based on the geographic
differences, which is not enough to concern the problem of unemployment.

The last main limitations regards to the data picked out in our research. Because of the
limitation of the source of data, the figures are not recently data. They are the annual
unemployment rate measured in the period from 2000 to 2008. Another problem is that we
just took the information of urban area of eight regions, so that the result is not enough to
provide a general view of unemployment issue.

4.2 IMPLICATION
In spite of some drawbacks mentioned above, this study offers some insight into the
difference in the unemployment rate of different regions in Vietnam. In more detail, the
results of this research show us the importance of geographical factor to the unemployment
rate in particular and in the labor market, the whole economy in general. Moreover, this may

Page 2
ANOVA Case Study

also suggest further studies into this issue, preferably with larger scale and more
sophistication.

2.CONCLUSION AND RECOMMENDATION

5.1 SUMMARY OF FINDINGS AND INTERPRETATIONS


After doing the study, we realize that at least the difference between unemployment rates
exists in two regions among eight of them in Vietnam and specifically, this is proved by the
hypothesis testing procedure given between the Red River Delta and Central Highlands.

5.2 RECOMMENDATIONS
With the figures picked out in this research, it is not deniable that the average unemployment
rate in Vietnam is quite high in the period from 2000 to 2008. Unemployment negatively
affects the living standard of people who do not have any work to do and influences the
economy as well as the whole society. As geography is one major factor affecting the
unemployment rate, more researches should be conducted to explain the effect of geographic
factor on unemployment rate. Government should consider the results of these researches to
find relevant solutions on the unemployment problem.

In addition, we recommend that follow-up studies should subject other affecting factors. They
should also take on a more sophisticated testing method to make sure that the unemployment
rate will be understood more clearly.

Lastly, it is highly advisable that subsequent research attempt larger population with the up-
to-date figures will help enhance the meaningfulness of the test.

REFERENCES
• Selvanathan. A, Selvanatan. S, Keller. G, Warrack. B, 2004, Australian Business Statistic, 3rd
edition, Thomson, Australia
• Levine. D.M, Stephan. D.F, Krehbiel. T.C, Berenson. M.L, 2004, Statistic for Managers, 5th
edition, Pearson, USA
• General Statistic Office, ‘The unemployment rate of labor force in urban area by regions
(1996-2008), downloaded March 18th 2011 at

Page 2
ANOVA Case Study
http://www.molisa.gov.vn/docs/SLTK/DetailSLTK/tabid/215/DocID/4686/TabModuleSetting
sId/496/language/vi-VN/Default.aspx

Page 2
ANOVA Case Study

APPENDIX

1. HISTOGRAMS

2. F-TEST FOR THE POPULATION VARIANCES


THE SOUTH CENTRAL COAST AND CENTRAL HIGHLANDS

The Central Highlands regions is denoted as population 1

The South Central Coast region is denoted as population 2

Step 1: The alternative and null hypothesis

Ho: σ12σ22=1

Ha: σ12σ22≠1

Step 2: The test statistic

As we want to compare the two population variances and the populations are independently
sampled, we use F-statistic:

F=s12s22

Which follow an F-distribution with ν1=n1-1 and ν2=n2-1

Step 3: Level of significance

α=0.05

Step 4: Decision rule

Critical value: Fα2,ν1,ν2=F.025,8,8≈4.43

F1-α2,ν1,ν2=F.975,8,8≈0.23

Reject Ho if F > 4.43 or F < 0.23

Step 5: Value of the test statistic

Page 1
ANOVA Case Study
s12≈1.68

s22≈0.25

F=s12s22=1.680.25=6.72

Step 6: Conclusion

Since F=6.72>4.43 , we reject Ho

There is enough evidence to conclude at 5% level of significance that the variances of these
two populations differ.

THE RED RIVER DELTA AND CENTRAL HIGHLANDS

The central highlands regions is denoted as population 1

The red river delta region is denoted as population 2

Step 1: The alternative and null hypothesis

Ho: σ12σ22=1

Ha: σ12σ22≠1

Step 2: The test statistic

As we want to compare the two population variances and the populations are independently
sampled, we use F-statistic:

F=s12s22

Which follow an F-distribution with ν1=n1-1 and ν2=n2-1

Step 3: Level of significance

α=0.05

Step 4: Decision rule

Critical value: Fα2,ν1,ν2=F.025,8,8≈4.43

F1-α2,ν1,ν2=F.975,8,8≈0.23

Reject Ho if F > 4.43 or F <0.23

Page 2
ANOVA Case Study

Step 5: Value of the test statistic

s12≈1.68

s22≈0.45

F=s12s22=1.680.45=3.73

Step 6: Conclusion

Since 0.23<F=3.73<4.43 , we do not reject Ho

There is not enough evidence to conclude at 5% level of significance that the variances of
these two populations differ.

3. T-TEST FOR POPULATION MEANS


THE RED RIVER DELTA AND CENTRAL HIGHLANDS REGIONS

The Central Highlands region is denoted as population 1

The Red River Delta region is denoted as population 2

Step 1: The alternative and null hypothesis

Ho: μ1-μ2=0

Ha: μ1-μ2≠0

Step 2: Test statistic

Because the objective is to compare two population means, two populations are
independently sampled and normally distributed and the variances are unknown but assumed
to be equal (as tested above), we use t-statistic for equal variances populations:

t=x1-x2-(μ1-μ2)sp21n1+1n1

With sp2=n1-1s12+n2-1s22n1+n2-2 , df=n1+n2-2

Step 3: Level of significance

α=0.05

Step 4: Decision rule

Page 1
ANOVA Case Study

Critical value: tα2,df=t.025,16=2.12

Reject Ho if t>2.12

Step 5: Value of the test statistic

Using the computer, we have result table:

Table 3: The result of t-test

Central Red
Highlands River
Delta

Mean 3.973333333 6.286666667

Variance 1.681775 0.4468

Observations 9 9

Pooled Variance 1.0642875

Hypothesized Mean Difference 0

D.f 16

t Stat -4.7568011

P(T<=t) one-tail 0.000107211

t Critical one-tail 1.745883669

P(T<=t) two-tail 0.000214423

t Critical two-tail 2.119905285

From the table, we can find that t≈4.76

Step 6: Conclusion

As t≈4.76>2.12 , we reject Ho

Therefore, there is enough statistical evidence to infer at 5% level of significance that the
unemployment rate in the Red River Delta and Central Highlands region are different.

Page 1
ANOVA Case Study

Page 3

You might also like