You are on page 1of 13

Visualization of Road Traffic Condition in Hong Kong

Lin Wencong 3035149568, Xia Fan 3035150256


7th Nov, 2014

This is a visualization project of both real-time traffic condition and


time-varying traffic condition of Hong Kong. The original real-time XML
data provided by the transport department of Hong Kong is captured
through HTTP GET request with a Java program, followed by the data
aggregation and transformation. All the pre-processed data and other
related info are stored into a designed database. The real-life traffic
condition is presented with Google Maps JavaScript API v3. And all the
time-varying traffic condition analysis is done with Tableau. Both the
data of them are supported by our own database which updated from
GovHK every five minutes. Finally, both the analysis results with filled
maps, line charts, area charts, videos and the fantastic presentation of
real-time traffic condition are combined together. All the visualizations
are organized and published into a web server. Just visit RoadTrafficHK
and enjoy the visualization.

Data Source and Format


1. Origin Data
All the data are obtained from GovHK. And the data is provided by HTTP GET from
the URL below and updated about every five minutes:
http://resource.data.one.gov.hk/td/speedmap.xml
The xml data consists of 617 records which represents 617 main roads of Hong Kong.

One data record is as below:

<jtis_speedmap>
<LINK_ID>3006-30069</LINK_ID>
<REGION>K</REGION>
<ROAD_TYPE>URBAN ROAD</ROAD_TYPE>
<ROAD_SATURATION_LEVEL>
TRAFFIC GOOD
</ROAD_SATURATION_LEVEL>
<TRAFFIC_SPEED>47</TRAFFIC_SPEED>
<CAPTURE_DATE>
2014-11-07T10:38:34
</CAPTURE_DATE>
</jtis_speedmap>
Field description:
Tag
<LINK_ID>

<REGION>

<ROAD_TYPE>
<ROAD_SATURATION_LE
VEL>
<TRAFFIC_SPEED>

Data Format
start_nodeend_node
The node ids of a start node
and an end node of the roads
in Hong Kong are included.
ENUM
K Kowloon
ST Sha Tin
TM Tuen Mun
HK HK Island

Sample
3006-30069

K
ST
TM
HK

URBAN ROAD
MAJOR ROUTE
ENUM
TRAFFIC GOOD
Indicate current road traffic TRAFFIC AVERAGE
condition in general.
TRAFFIC BAD
INT
Exact figures of current road 45
traffic speed on average per
ENUM

five minutes.
yyyy-MM-ddTHH:mm:ss
Exact date-time of record.

<CAPTURE_DATE>

2014-11-07T10:38:34

Field relationship:

Road type

urban road

Saturation level

max

min

max

min

traffic bad

14

24

traffic average

29

15

49

25

traffic good

30

50

major road

Special Instruction of LINK_ID:


LINK_ID is the tag related to geographic positions of the raw data. But the

geographic positions cannot be calculated only with LINK_ID. Another table provided
by GovHK has information about these links and nodes. The structure of the table is
like this:
Attribute

Sample

Link ID

722-50059

Start Node

722

Start Node Eastings

834038.674

Start Node Northings 816345.067


End Node

50059

End Node Eastings

833862.7

End Node Northings

816441.533

Region

HK

Road Type

MAJOR ROUTE

It is shown that a link is related to two nodes whose geographic positions are
given. The position of a road can be calculated with those of the matching start
nodes and end nodes. The geographic information in the table is all based on HK
1980 Grid Coordinate.

2. Data pre-capturing and Data pre-pocessing


For the time-varying traffic condition analysis, huge amount of original data
records are necessary. However, GovHK do not provide the historical data. So
capturing data in advance is important. Then a Java programme was implemented
on 29th Sep to record the xml files every five minutes. Considering some bugs and

system crushes, finally we get the whole intact data from 5th Oct to 14th Oct. In
other words, 2880 xml files are the data source for the time-varying traffic condition
analysis. And each xml files consists of 617 records which are in the row data format
above.
So it is vital important to find a way to store and fetch these data efficiently and
flexibly. Using a database becomes our best choice. Then another Java programme
was implemented to extract all the raw data from the 2880 xml files and store each
raw data into a database table. Finally 1,776,960 records of raw data were stored
into a single table. Basically, it is too many for tableau to process and attributes like
Link_id have nothing related to real-life. So some pre-processed must be done:

Transfer attributes:
HK 1980 Grid Coordinate is used by nowhere but Hong Kong to locate places. So

it is necessary to be transformed into longitude and latitude.


There is a formula provided by Explanatory Notes on Geodetic Data in Hong Kong
for conversion. The conversion equation is as follows:
Longitude:
(

)(

Latitude:

where

and

is the latitude of which

And here are the parameters for the formula:


Parameter

HK1980 Grid
819,069.90m N
836,694.05m E
221843.68 N
1141042.80 E
1

2,468,395.723 m
6,381,480.500 m
6,359,840.760 m
The conversion is finished by a Java application which updates all the records in
the MySQL database setting the longitude and latitude of the nodes.

Data aggregation:
Go through the database, we found that most saturation level of a road lasts at

least 5 records. It means that the minimum traffic jam lasts for about half an hour.
Then aggregation can be done with the data to reduce the count of

records.

Besides, 5 minutes traffic fluctuation should be regarded as noise data to the degree
of a days traffic condition.

3. Database Design
As mentioned above, database is necessary for both the traffic condition
analysis and the real-time traffic condition.
Since the data are not so complicated, our database consists of three tables for
data store and one view for Tableau input.
Tables:
1) Jtis_speedmap: store the original raw data provided by GovHK
Table Size: 1,776,960
Field Name

Type

link_id

char

region

char

road_type

char

road_saturation_level

char

traffic_speed

int

id

int

form_capture_date

datetime

2) Jtis_speedmap: store the raw data after aggregation(twice an hour)


Table Size: 296160

Field Name

Type

link_id

char

region

char

road_type

char

road_saturation_level

char

traffic_speed

int

id

int

form_capture_date

datetime

3) Tsm_link_and_node_info: relationship between a link_id and real-life


longitude latitude
Table Size: 617
Field Name

Type

link_id

varchar

start_node

int

start_node_eastings

double

start_node_northings

double

end_node

int

end_node_eastings

double

end_node_northings

double

region

varchar

road_type

varchar

View: all data provide for Tableau


Field Name

Type

region

char

road_type

char

road_saturation_level

char

traffic_speed

int

id

int

longitude

float

latitude

float

form_capture_date

datetime

Difficulties
1. Data Preprocessing
The raw data is not friendly to users, or Tableau. The only field related to
geographic position in the real-time data is LINK_ID. LINK_ID is the primary key of
another static table which contains geographic position information of all the start
nodes and end nodes of the links. But the information is not based on WGS84
longitudes and latitudes but on HK 1980 Grid Coordinate. Therefore, a
transformation is necessary for the geographic information. And this is done in Java.
The real-time data is captured every 5 minutes, which implicates the necessity of
granularity reduction. The aggregation is also done in Java, after which the interval is
enlarged to 30 minutes.

2. Finding Insights in Maps and Charts


At the beginning, it was tough to find anything interesting in the maps or charts,
because a large number of data is used and there are several fields for classification
or measurement such as regions, types of roads, states and time. And the solution is
filtration of data and fields, trials on different rows and columns, different types of
charts, and different combinations of colours. Pages are also applied for dynamic
visualization as time goes by. Editing the axes does help as well. All kinds of methods
emphasizing the contrast help discover the novelty.

3. Real-Time Visualization on Google Maps


The Google maps API allows people to draw lines and paint them on the map in
JavaScript. But if the lines need to be updated every five minutes, the data should be
captured and preprocessed in JavaScript. But the server of the government does not

allow cross-domain requests. At last, the solution is to capture and save the data in
the local database and send the requests to the localhost for updated data.

Visualization Approaches
Undoubtedly, maps are used for directly perceived view. We also use line charts
for observing fluctuation and comparison, and area charts for changes of proportion.
Plenty of trials on colours are taken for clear labelling and quantification. Dark
colours are used on important objects while light colours are on unimportant ones.
Last but not least, videos recorded for feeling time-varying data are very important.

Visualization Techniques
1. Java: data capture, data store, data aggregation, data transfer
2. JavaScript: Google Maps JavaScript API v3 for real-time traffic condition
visualization
3. PHP: fetch pre-processed data from local server
4. MySQL: data store, data export for Tableau, data support for real-time traffic
condition visualization
5. Tableau: is the best tool to make the visualization. Discoveries turn up with the
use of pages, data filtration, all types of charts, painting the chart in different
combinations of colours, etc.
6. Fraps: record screen
7. Weebly: whole project presentation

Implement
1. The process of processing data for Tableau

2. Real-time traffic condition visualization

Answers to the Questions


Just refer to the real-time traffic map if you are wondering if there is a traffic
jam in a certain place.
According to the line charts, morning and evening peaks seem inevitable
everywhere. The period when the traffic is in good condition can be the time far
away from these peaks.

New Insights
1. Usually, the roads in the traffic jams are constant. These are the section of New
Territories Ring Road adjacent to Tuen Mun and Sha Tin Metro Station, Waterloo
Road, Princess Margaret Road, West and East Kowloon Corridor and Gloucestor
Road from Admiralty to North Point.

2. For urban roads, Traffic speed in Hong Kong Island fluctuates most due to time in
one day, followed by that in Kowloon. But there is little fluctuation in traffic
speed in Tuen Mun. And the traffic speed in Sha Tin has become the fastest.

3. At most of time, traffic jams are most serious in HK Island. And morning peaks in
HK Island and Kowloon usually appear at 9, but those in Sha Tin tend to start at 8.

4. From Sunday to Monday, the average traffic speed in major routes goes straight
down into a low level, which means the serious morning peak.

5. Traffic speed in major routes are generally much higher than that in urban roads.
But when it comes to traffic jams, the urban roads are the better choice to go
through, except those in Tuen Mun.

Unaccomplished Visualization or Functionalities


In the proposal, a question to be answer is how to give the best guidance to
people who are willing to find the most timesaving way to go between two places.
Finally it is not realized in the project. The first reason is the data provided cannot
cover all the roads in Hong Kong. Actually just a few of the roads (617 links) are
included. Besides, Google maps has done very well in this area, so it may not make
much sense to finish this job.
Our focus finally returns to the real-time and time-varying data visualization.
And there are quite a few interesting findings.

You might also like