You are on page 1of 39

Chapter 5

Data analysis and visualisation


Key knowledge
After completing this chapter, you will be able to demonstrate knowledge of :
stages of the problem-solving methodology
types of information problems and users needs that can be met through
presenting information in visual forms
problem-solving activities related to analysing information problems
types of data visualisations
sources of authentic data
data types and data structures relevant to selected software tools
purposes of data visualisations
suitability of different types of visualisations that meet users needs
design tools for representing data visualisations
needs of users that can influence the type and presentation of visualisations
criteria and techniques for evaluating visualisations
characteristics of file formats and their ability to be converted into other
formats
functions of appropriate software tools to select required data and to
manipulate data when developing visualisations
formats and conventions applied to visualisations in order to improve their
effectiveness for intended users.

For the student


If you can imagine that the amount of data that is generated every day just by
using social-networking tools, you might also be able to imagine that there is
someone, somewhere, who is looking through a mountain of data looking for
meaning.
Data visualisation is the process in which we take large amounts of data
and process it into effective graphical representations that will meet the
needs of users or clients. These representations can take the form of charts,
graphs, spatial relationships and network diagrams. In some cases, the data
visualisation might involve interactivity and the inclusion of dynamic data
that allows the user to deduce further meaning from the visualisation.

For the teacher


This chapter introduces students to the knowledge and skills needed to
use software tools to access authentic data from repositories and present
the information in a visual form. The key knowledge and skills are based
on the Unit 2 Area of Study 1. If a data visualisation is effective, it reduces
the effort needed by readers to interpret information. This chapter takes
students through the different types of visualisations, and then uses a
case study to explore some of the tools available to process data from
repositories, such as the Bureau of Meteorology and the Australian Bureau
of Statistics.

Revising the problem-solving methodology


As you might remember from Unit 1, problems exist when the current
information system does not meet the information needs of the clients
or users or when there is a new need identified. This could be a simple
problem, such as miscalculations in a database, or a more complex one
that involves finding an appropriate tool to display otherwise complex and
incomprehensible data in a visual form.
The process that we go through is the application of the problemsolving methodology. You may remember that this is a framework that can
be used to guide the development of an ICT solution. Each stage of the
methodology takes you through key aspects of the solution to consider,
while always focusing on the needs of the user (Figure 5-1).

ANALYSIS

DESIGN

DEVELOPMENT

EVALUATION

Requirements
Constraints
Scope
Solution design
functionality
appearance
Evaluation criteria

Many project managers invest the


majority of their time at the analysis stage
of the problem-solving methodology to
prevent time being spent on unnecessary
tasks at the development stage.

The problem-solving methodology is


explained in detail on page xiii.

Manipulation
Validation
Testing
Documentation
Strategy (measuring)
Reporting

FIGURE 5-1
Problem-solving methodology.

Presenting information in a visual form


Many people prefer information to be presented in a visual form rather than
lines of text or figures. If we take a close look at an information problem that
deals with the processing of a lot of complex data, often the needs of the users
can be met by presenting the data in a visual form.
9780170187466

Edward Tufte is known for his work


with information design. See
www.edwardtufte.com

177

Information Technology VCE Units 1&2

The website Studying Style <www.


studyingstyle.com/> claims that 65
per cent of people are visual learners.

A distribution of data is when


collected data is spread over a period of
time or dates; for example, data on how
many people use public transport.

For example, a secondary school might have a problem with large


numbers of students signing in late every day. It is assumed that there
is an issue with student behaviour, and there is a ten-page report that
is printed every day and distributed to each year-level coordinator.
The information problem is that the ten-page printout is difficult to
understand because of the vast quantities of data. A report listing
every late student at a school might seem impressive and informative,
but the list is just that a list of data (Figure 5-2). The large report
meets the needs of a year-level coordinator chasing up students;
however, the list does not show the distribution of latecomers or
whether there is a relationship between lateness and different days of
the week.

Time Student name Surname Home group

Excuse

8.35

Scott

Kelly

7A

Train was late

8.35

Michael

Fitzgerald

12B

Slept in

8.36

Sally

Saunders

7C

No excuse, just
late

8.36

June

Ho

9D

Tram was late

8.36

Rosanna

MacKenzie

9A

Train

8.37

Dion

Lei

9A

Missed the train

8.37

Tony

Marchetti

9A

Train was late

8.40

Yolanda

Kerr

7C

Train was late

8.40

Finn

Crisp

10B

Orthodontist

8.41

Hong

Chen

10B

Car broke down

8.41

Tim

Morris

11E

Mums fault

8.42

Christine

Nguyen

12B

Train

FIGURE 5-2
Late student list shows information that is only meaningful to the year-level coordinators.

Think about IT 5-1


What other data sets might become
more meaningful when distributed
across a timescale?

178

There are three different ways in which the information on lateness


could have been presented:
written, such as the ten-page report, which was generated from the
system
oral, such as a voicemail or broadcast
visual, such as a histogram or a chart.
In the case of the late students, a chart will provide a more immediate
understanding of the distribution of data across a timescale from 8.30 a.m.
to 9.30 a.m.
Using a simple data visualisation, such as a histogram or a line graph
(see Figure 5-3), we can identify when the majority of students are signing
in late and this information might be used by school administration to
identify a pattern and formulate a plan of action.
9780170187466

Chapter 5 Data analysis and visualisation

Students late on Monday

Number of late students

30
25
20
15
10
5
8.30
8.35
8.40
8.45
8.50
8.55
9.00
9.05
9.10
9.15
9.20
9.25
9.30

0
Time

Think about IT 5-2

FIGURE 5-3
The line graph shows the distribution of late students against the time that they have
arrived.

The line graph might reveal a common time that students are late every
day (Figure 5-4). This information might encourage further investigation,
which might reveal an issue with transport or the distance that those
students travel to school each day. The presentation of the information in a
visual format allows us to clearly see patterns in the data that we might not
have seen before. Used correctly, data visualisation can increase the clarity
and usability of information, therefore increasing its effectiveness.

Number of late students

35

In what ways does your school


communicate its data on late students
to staff?
Increasing the clarity of a solution
involves presenting the information in an
easy-to-read format. There should be no
confusion as to what the information is
trying to communicate.
Presenting information in a usable format
means that it is capable of being used for
what it was intended. It requires no more
formatting to meet the needs of users.

Late students for week 4 in Term 1


Students late
Monday

30
25

Students late
Tuesday

20
15

Students late
Wednesday

10

Students late
Thursday

Students late
Friday
8.30
8.35
8.40
8.45
8.50
8.55
9.00
9.05
9.10
9.15
9.20
9.25
9.30

0
Time

FIGURE 5-4
The line graph shows the distribution of late students for every day of week 4 in Term 1.
The way this data is displayed might provide an easier way to understand the data,
hopefully highlighting a trend in arrival times.

Think about IT 5-3


Can you think of other types of charts
might be useful for analysing the late
students data?

Presenting a data visualisation solution


When users or clients are looking either for greater clarity from a set of
information or to identify patterns or relationships with data sets, they might
be looking for a data visualisation solution more so if the data involved
would normally be difficult to analyse in a written or oral format (see
Figure 1-15).
9780170187466

179

Information Technology VCE Units 1&2

More and more data is being presented


in a visual form. For example, Wordle
<www.wordle.net/> can be used to
create a text cloud, which shows the
frequency of certain words being used in
a document or on a webpage.

Think about IT 5-4


Using the Wordle tool at www.wordle.
net/, use the text from your school
newsletter to create a text cloud. Does
this data visualisation tool clearly show
any key themes from your school
newsletter?

If the user is looking to:


compare data, a bar chart might be used
show a distribution of data, a histogram might be used
show a relationship between two data sets, a scatter diagram might be used
show a composition of data that changes over time, this might involve an
animation or simulation.
Often before we start this process, we have an idea of what the solution
might roughly look like based on the needs of the users. We might ask the
following questions:
Does the information problem involve data analysis?
Does the data need to be compared?
Does a distribution need to be identified from the data set?
Does a relationship between two sets of data need to be identified?
Does the user need to make a decision from a collection of data?
Is there a need for the data to be communicated in a graphical way
because of the types of users?

Analysing information problems


Abraham Lincoln is quoted as saying, If
I had eight hours to chop down a tree,
Id spend six hours sharpening my axe.
Time spent analysing and designing an
information problem is never wasted.

Many people believe the analysis stage of the problem-solving methodology


is the most important part. If you get this stage correct, your solution is more
likely to be successful as you will have a clear understanding of the problem.
This analysis stage of the problem-solving methodology often involves the
process of determining the scope, requirements and constraints of a solution.

CASE STUDY:
Southside Makers
Southside Makers is a not-for-profit craft association that supports
craftspeople, hobbyist and micro businesses in Melbournes southern
suburbs. The Southside Makers comprise about forty people who get
together socially to craft, and they also run a market every month in
St Kilda Town Hall to promote handicrafts.
The Market Committee includes three hardworking volunteers,
and all work contributing to the success of the market is done for the
love of it.
As part of the Southside marketing strategy, a letterbox drop has been
done a few weeks before the market in suburbs surrounding the market.
Members of the craft club also place bundles of pamphlets in cafs in the
hope that customers will pick one up and then visit the market. In the
past the letterbox drop has been an unplanned approach, with volunteers
picking random streets near their homes or in the suburbs surrounding
the market.
The last few markets have not seen an increase in attendance;
as a result the stallholders have not made as many sales as they
would normally have done. The stallholders have requested that the
Market Committee investigate possible reasons behind this drop in
attendance.

180

9780170187466

Chapter 5 Data analysis and visualisation

FIGURE 5-5
Southside Makers Market is run by volunteers.

Remember that the first step in problem-solving is conducting an analysis.


The analysis involves an investigation of the factors that affect the problem
(constraints), identification of the needs of the users (requirements), and
the output that must be produced to meet those needs (scope). A common
technique is to write out the problem as a short statement or question.

CASE STUDY:
Southside Makers
After lengthy discussion by the Market Committee, the following
constraints were identified:
Southside Makers do not have a lot of resources to put towards a
solution, so any solutions have to be reasonably priced with minimal
impact on the information system that they already have.
Southside Makers also identified a time constraint. The solution had
to be achievable before the next market.
Southside Makers have devised a short statement that defines their
information problem: How can Southside Makers improve the effectiveness
of their marketing campaign to increase attendance at their markets?

Project constraints are often defined


by the time and resources data, people,
software, procedures available for the
solution. Another constraint may also be
which area of the organisation is being
targeted.

Obtaining feedback

CASE STUDY:
Southside Makers
At the last market, the Southside Market Committee decided to undertake
a visitor survey, both online and hard copy, to try and identify the average
market visitor. It is hoped that the information obtained is going to be used
to make changes to the current marketing strategy.
9780170187466

181

Information Technology VCE Units 1&2

Quantitative data is measurable


and specific; for example, the amount
of students who attend assembly each
week.
Qualitative data is harder to
measure and deals with peoples opinions
and feelings towards something.

One popular online survey tool is


SurveyMonkey<www.surveymonkey.
com/>. Registered account holders can
download reports and the raw data is
given in a CSV format. This data can then
be used to create data visualisations.
SurveyMonkey was discussed on
page 130.

When seeking feedback there are two types of data that can be gathered:
quantitative and qualitative.
Quantitative data is measurable and specific and therefore easier to
chart or graph. At a simplistic level, quantitative data gathering is based on
verifying theory through the use of statistics and largely numerical data,
whilst qualitative data gathering typically generates theory. An example of
quantitative data is: Fifty per cent of market attendees bought a product from
the market. An example of qualitative data gathering is more descriptive: On
a cold and wet, winters day, many older members of the community turned up
to the market dressed in raincoats, wearing scarfs, and most carried umbrellas.
Quantitative data can be gathered by using the following data gathering
instruments:
surveys
questionnaires
observation.
When data has been gathered using the instruments mentioned above,
quantitative data can be analysed by using software such as Excel, SPSS,
Minitab or SAS. This takes times and often involves hours of data entry
depending on the complexity of the data gathering instrument (Figure 5-6).
For simple data gathering, on-line surveys such as SurveyMonkey allows
users to create surveys and manage the collection and analysis of quantitative
data. Results are often saved into a database and then downloadable in
CSV (comma-separated value) format (see Figure 5-7 and Figure 5-8).
SurveyMonkey also permits qualitative data to be entered.

FIGURE 5-6
Data gathering doesnt have to be complicated: it just needs to be effective.

What is a CSV file?


A CSV file is plain text file that is specially
formatted to store spreadsheet- or
database-style information. Each line of
the file has one record on it, and each
field within that record is separated by a
comma.
182

FIGURE 5-7
A CSV file opened in Notepad. Data is unreadable and hard to understand, but the file
format means that the file size is very small.
9780170187466

Chapter 5 Data analysis and visualisation

FIGURE 5-8
A CSV file opened in a spreadsheet program. The information starts to become more
meaningful when put into columns.

CASE STUDY:
Southside Makers

Think about IT 5-5

The Southside Market Committee could see from the data that they had
gathered that:
The key demographic of people attending the market are women aged
thirty to forty-five, with at least one child.
The establishment of a kids craft table was one feature of the market
that rated highly on the survey.
The letterbox drops have been largely ineffective, with few people
identifying a pamphlet in their letterbox as their reason for attending.
A significant number of market attendees picked up a pamphlet from
their local caf and cited this as the reason why they attended the market.
Additional feedback was obtained from Southside Makers Market
stallholders on the day, and informal or verbatim feedback was gathered
and compared with the data being obtained through formal surveys.

What are some of the other techniques


that Southside Makers can use to
obtain feedback from visitors, members
and stallholders?
Demographic refers to the
characteristics of an area or group of
people. For example, a demographic may
be people aged 1825 from non-English
speaking backgrounds, or aged 75 and
above and living by themselves.

Quantitative data is easy to communicate using data visualisations,


such as pie charts. The pie chart in Figure 5-9 clearly shows how visitors
found out about the market. What percentage found out through pamphlets,
newspapers or friends?
Visitors attending Southside Market
Facebook
Poster in a shop or caf
Was walking past
Another craft blog
Letterbox drop
Newspaper listing
Invited by a stallholder
Southside Makers blog
Invited by a friend
White hat listing

FIGURE 5-9
A pie chart showing quantitative data.

9780170187466

183

Information Technology VCE Units 1&2

A text cloud can be used to visually


represent the words from a survey, a
chapter of a book or a collection of tags
from a blog. Wordle <www.wordle.net/> is
a good example of a text cloud creator.
Queensland-based company Leximancer
is pioneering the area of data
visualisations using qualitative data
to create themes, concepts and their
associated relationships. On Leximancers
website <www.leximancer.com/>,
interesting data-visualisation examples of
key reports are given, such as the 2009
Australian Defense White Paper.

Qualitative data is harder to measure than quantitative data. When


gathering qualitative data, interviews, video footage and observation are
data gathering instruments. Generally, these tools need to be recorded
accurately and transcribed at a later stage. The analysis of the qualitative
data is also quite different to quantitative data. With quantitative data,
the researcher looks for themes or patterns through the use of numbers,
whereas with qualitative data, the researcher establishes rich descriptions
and finds themes through classification and coding.
A text cloud can be used to analyse qualitative (that is, non-specific and
harder to measure) feedback from the stallholders. Key words immediately
jump off the page, clearly communicating what was emphasised in the text.

CASE STUDY:
Southside Makers
A simple text cloud was used to analyse the qualitative data from online
surveys from the market to see whether there were any patterns in what
the stallholders and visitors were saying about the market.
Upon closer analysis, several words jumped off the page, such as
music and customers (Figure 5-10). This visualisation was then used,
along with the quantitative data, to compare how many stallholders were
happy with their day and whether they sold enough products.

FIGURE 5-10
A text cloud from qualitative feedback from stallholders created with Wordle.

To scope a problem, we clearly define


what the problem is and what the solution
will affect.

Data visualisation might be used as part of an information solution, but


it can also be used at the analysis stage to identify further data that might
be needed to properly scope the problem before moving forward to the
design phase.

CASE STUDY:
Southside Makers
The current marketing strategy of letterbox drops does not target areas of the
community representative of the average market visitor or buyer. As a result,
the volunteers are not using their time effectively to distribute pamphlets.
184

9780170187466

Chapter 5 Data analysis and visualisation

Through discussions with stallholders, the committee feels that there


is an additional need to attract more customers with higher disposable
income so that they may purchase more products from stallholders. The
following questions are asked: Where do high income earners hang out?
and How would we find out this information?
Being a volunteer organisation, marketing is done on the cheap. The
committee needs to come up with a way to work smarter in terms of
advertising the Southside Makers Market.

Defining the problem


Based on the feedback from stallholders and visitors to the market, the Market
Committee identified an area in which they believe problems exist.
These problems are causing frustration for stallholders as the market
threatens to become no longer viable for them to attend. A possible solution
to this problem is to identify effective areas to do letterbox or caf pamphlet
drops, where there is high disposable income.

Solution requirements
Imagining the solution before it has been produced takes a lot of thought. How is
the organisation going to use the information? In what format does the solution
need to be presented? The information being produced will be influenced by
the type of data gathered. If the Southside Makers were to do a shopping strip
survey on a Saturday morning in St Kilda, they would receive very different data
than they would from knocking on doors in the area during the week.

CASE STUDY:
Southside Makers
Southside Makers Market need information that will help them with
targeted letterbox drops in the St Kilda area. They need to reach the
householders who have money to be spent on impulse items at a market.
The information to meet their needs is quite specific.
The Market Committee could go doorknocking in the area to find out
what the average earnings are, or they could survey people on a Saturday
morning at the nearby shopping strip. However, both of these datacollection methods take time, and the quality or quantity of information
is not guaranteed. Another option is to use data that is freely available
online and is collected by a reputable data-collection organisation.
Using freely available data obtained from the Australian Bureau of
Statistics (ABS), the Market Committee can identify:
the income of people living in St Kilda
the different types of households in St Kilda; for example, single or
families.

9780170187466

The ABS manages quite a number of


data sets: from the National Census data
collected every five years to the consumer
price index. Its data can be manipulated
in text form, or viewed as a thematic map
through its MapStats application on its
website at www.abs.gov.au/.
185

Information Technology VCE Units 1&2

Think about IT 5-6


Use the ABS website <www.abs.
gov.au> to find out more about your
community. Click on Census Data
and then on Community Profiles
for statistical data. If you are after a
visualisation of the statistics, click on
MapStats.

FIGURE 5-11
An example of a thematic map from the Australian Bureau of Statistics website showing
the distribution of data. Dark areas show high-income households, which are defined as
earning more than $2500 per week.
A data set is a collection of data; for
example, census data collected every five
years might be regarded as a data set.

A comparison of these data sets should show the Southside Market


Committee the areas around St Kilda in which they should be performing
their letterbox drops (see Figure 5-11 for an example).

Constraints and limitations


Constraints might involve:
time time taken to produce the
solution or a timeframe in which
the solution needs to be created
resources data, people, hardware
or software that needs to be used
as part of the solution.

A constraint is any factor that influences the nature of the solution. This may
be the time that it might take to produce the solution, the style and size of the
solution or the characteristics of the users. It is important to try and identify
as many constraints as possible before heading into the design phase of the
problem-solving methodology, such as:
Does the solution need to appeal to a certain gender?
Does the solution need to appeal to a certain learning style?

CASE STUDY:
Southside Makers
A constraint on the information needed for the Market Committee might
be an easy-to-read solution, where the committee can draw meaning from
it reasonably quickly.
As the committee is dealing with a local area, the solution could
provide them with a map that they can use to distribute pamphlets for
the next market.
A list of statistics listing each street and their average earnings would
not assist the committee in making decisions about how to advertise
their market. The problem in this case study is to provide the Market
Committee with accurate data so that they can plan their marketing
strategy for their next event.
186

9780170187466

Chapter 5 Data analysis and visualisation

If the resulting solution is a data visualisation, is the audience familiar


with the problem or will they be seeing this solution for the first time?
How much detail does the solution need to have?
The limitations of a solution also need to be considered. Will there be
any variables that will limit the usability of the solution, such as:
Does the solution have technical limitations?
Can the solution only be viewed by a certain piece of software or by a
certain browser?
Do you need to register or have an account to access the solution?
Is the solution time-sensitive? Does it have to be communicated within a
timescale?
Are there any aesthetic limitations that prevent the visually impaired from
understanding the solution?

Types of data visualisations

Limitations might be:


technical, when using the solution
economic, such as needing
to purchase software or a
membership to view a solution
aesthetic, which might prevent
a visually impaired person from
viewing the solution.

Think about IT 5-7


What constraints and limitations might
affect the development of a school
website?

When IBM launched its personal computer range in the 1980s (Figure 5-12), one
of its main marketing angles was that the average person could use software tools
to be more productive, in particular through spreadsheets to process and analyse
its data. Images of people getting excited by green writing on black screens were
broadcast everywhere, paired with images of graphs and charts being used to
make otherwise confusing data more meaningful and easier to understand.

FIGURE 5-12
IBM released the 5150 PC in September 1981.

Now, we have more data and information at our fingertips than ever before.
It is estimated by a report in the Economist in February 2010 that we are
bombarded by up to 34 gigabytes of data every day by the various tools and
websites that we access. Social-networking tools have impacted on us by greatly
adding to the mountain of data that is created every day (see Figure 5-13).
But with challenges, there are also opportunities every day another datavisualisation tool is being developed to be able to cut through this constant
stream of data and give the users something that they can work with.
Data visualisation is a visual representation of data that would
normally be quite hard to understand and derive meaning from.
Displaying data in a visual form can add clarity and provide the
end user with a snapshot of what the data is trying to communicate.
9780170187466

Juice Analytics has a handy little widget


on their website <www.juiceanalytics.
com/chartchooser/> that allows you to
choose the reason why you are charting,
and then you can download into your
spreadsheet package a blank template to
play around with.

187

Information Technology VCE Units 1&2

FIGURE 5-13
This diagram shows a visual representation of the data being pushed out by Twitter every day.

Often people who present to an audience will use data visualisations to


communicate a point because it is easier for the audience to understand quickly.
The four main ways in which data visualisation can be used are:
comparing data
showing distribution of data
showing relationships between data
showing the composition of data.

Data visualisations to compare data

There are a number of creative variations


on the humble bar chart. The use of icons,
words and 3-D shading can be used to
emphasise the data being visualised.

188

Comparing data is one of the most common practices for businesses: How
much did we sell last year compared with this year? or How many people
use public transport on a Monday compared with a Friday? A comparative
visualisation is one of the easiest to set up and understand.
A chart is the most common data visualisation used. It is a visual
representation of a set of data. The charts that might be used to show a
comparison between two data sets are: column charts, bar charts, line
charts, pie charts and comparative maps.
For example, a pie chart can be used to show a comparison between
traffic sources for a website (see Figure 5-14). The colours and ratio of the
wedges provide the user with an immediate understanding of the data
being shown, and they can see immediately where people are coming
from. Are they arriving at this website directly, via a search engine or are
they being referred from another website? This form of data visualisation
would be far more useful than a series of bar charts.
9780170187466

Chapter 5 Data analysis and visualisation

FIGURE 5-14
The Google Analytics pie chart showing the origin of traffic sources for a website.

Similarly, a data list ranking the country of origin of website visits is


informative and interesting (Figure 5-15), but a comparative map provides
an immediate and visual feedback for the website developer and they can
clearly see the spread of interest across the globe (Figure 5-16).

Think about IT 5-8


Can you think of any other data sets
that you might want to compare with
these visualisations?
Can you think of any other data sets
that you can display using similar data
visualisations?

FIGURE 5-15
Google Analytics provides a data list of website visits from countries, ranked according
to the most popular origin.
Complex data manipulated in a
spreadsheet (Figure 5-15) can be
visualised using a data visualisation
tool to make it easier to understand
(Figure 5-16).

FIGURE 5-16
The Google Analytics comparative map can instantly show you where the most amount
of hits are coming from for your website.
9780170187466

189

Information Technology VCE Units 1&2

A line chart or histogram might be a good tool to use to show a


comparison of site visits over time (Figure 5-17). The web developer can
then start to track relationships between site visits and site updates.

FIGURE 5-17
Google Analytics line histogram showing visits over time: the manager of the website
can observe changes in viewer habits over a period of time.

Data visualisations to show


distribution of data

A histogram shows how data has


changed over a timeframe; for example,
hours, days, months and years.

Distribution of data can be used to analyse when an event is happening across


a timeframe. We are not comparing two sets of data; rather, we are looking at
how the data relates to a fixed item, such as a time, date or even grades. Some
common data visualisations used to show distribution of data are column
histograms, line histograms and scatter charts.

Column histograms
Figure 5-18 features a column histogram that shows how hits on a website
are distributed over a period of thirty days.

FIGURE 5-18
SiteMeter column histogram for 30 days. This popular diagnostic tool allows
clients to view a histogram of visits to their registered website across a period of
7 days, 30 days or 12 months. This figure is an example of a histogram for the craft
blog Konstant Kaos.
190

9780170187466

Chapter 5 Data analysis and visualisation

The user can clearly see that there was a spike with hits on a particular
day; they can then go and investigate whether it was something that the
website did to cause this or an external force, such as someone linking to
the website from a blog.
Using the column histogram, it is far easier to understand and
identify patterns compared with the raw data (Figure 5-19). A histogram
of todays hits clearly shows that people check the website at the start of
their work day and at lunch time (see Figure 5-20). This may influence a
decision in regard to advertising a product.

Think about IT 5-9


Google Analytics and SiteMeter are just
a couple of the many tools available to
track activity on a website. What does
your school use?

FIGURE 5-19
A SiteMeter data set that shows hits on a website. In this form, it is harder to see the
distribution of data across the day.

Line histograms
The most common line histogram is the bell curve. Many educational institutions
use this model to ensure that there is a spread of results for each subject. For
the bell curve in Figure 5-21, for example, students with marks at the top of the
curve are average students; those on either side of the curve are below and above
average. The majority of student marks are at the top of the bell curve.

Scatter charts
Scatter charts can be used to show a distribution of data against a fixed unit, such

as time. For example, a scatter chart might be used to visually show the time that
students are signing in late along one axis, and which year level they are in to see
if there is a relationship between the distributions of the data (see Figure 5-22).
9780170187466

A scatter chart can be used to show


a distribution of data against a fixed unit
of time.

191

Information Technology VCE Units 1&2

FIGURE 5-20
SiteMeter hits on a website across a day. This column histogram clearly shows that
people access this website at certain times of the day.

Norm-referenced tests (NRTs) are designed to compare


student performance with other students
All students taking test

Average

Below average
0

Above average

50
Scores in percentiles

100

FIGURE 5-21
Example of a bell curve.

Data visualisations to show relationships


between data
Showing a relationship between two or more data sets is a little more
complex and requires more planning. You need to understand what
you are trying to communicate to be able to organise the data in such
a way that the visualisation can communicate your message. Two data
visualisations used to show relationships between data are scatter charts
and bubble charts.
192

9780170187466

Chapter 5 Data analysis and visualisation

Students that are late to school


12

Year level

11
10
9
8
7
8.34

8.36

8.38

8.40

8.42

8.44

8.46

8.48

8.50

Time students arrived

FIGURE 5-22
Scatter charts can show the distribution of data. In this example, you can clearly see that a
group of Year 9 students arrived at the same time on the day that the data was processed.

Scatter charts
A scatter chart can again be used to show the relationship between two data
sets. In Figure 5-23, for example, the scatter chart shows a relationship between
two sets of data: frequency of volcano eruptions and the waiting time between
eruptions. In a scatter graph, the independent variable, the eruption duration,
goes on the horizontal axis while the dependent variable, the waiting time
between eruptions, goes on the vertical axis. The chart can clearly show a
relationship between the two sets of data; and to assist with the understanding of
this data, we can draw a line to show this.

Waiting time between eruptions (min)

Sample scatter graph


100
90
80
70
60
50
40

2.0

3.0
4.0
Eruption duration (min)

5.0

FIGURE 5-23
The volcano scatter chart clearly shows the waiting time between volcano eruptions and
the duration of the eruption. A line of best fit can then be drawn to show the relationship.

9780170187466

193

Information Technology VCE Units 1&2

The Whitburn Project

Bubble charts

The Whitburn Project <http://waxy.

Bubble charts are quite similar to scatter charts, but one set of data is shown
as bubbles that represent a size or percentage. Take a look at Figure 5-24. A
random survey of stallholders at the Southside Makers Market and their sales
from the day reveal a relationship between the amount of products that they
had on sale and the amount of sales that they achieved.
The raw data for the bubble chart is easy enough to read and
understand, but you cannot easily see the relationship between the two
sets of data (see Figure 5-27). A line of best fit (also known as a trend line)
could also be added to the bubble chart to show more clearly how the
relationship works.

org/2008/05/the_whitburn_project/> is

a labour of love by a group of obsessive


record collectors. In their quest to
electronically preserve high-quality
recordings of every popular song since the
1890s, they have created a spreadsheet
of 37 000 songs and 112 columns of raw
data. Search around the Internet, and
you can see the analysis of this project
presented as a collection of scatter and
bubble charts that show some interesting
relationships between song duration and
how long a song stays in the charts. (See
Figures 5-25 and 5-26.)

Analysis of sales at
Southside Makers Market

1600.00
1400.00

Total sales ($)

1200.00
1000.00
800.00
600.00
400.00
200.00
0
200.00
400.00

10

15

20

25

30

Total products for sale

FIGURE 5-24
This bubble chart takes raw data and displays it in a more meaningful way.

Average Song Duration (1944-2008)

Song duration (minutes : seconds)

0:05:02
0:04:19
0:03:36
0:02:53
0:02:10
0:01:26
0:00:43
0:00:00
1944 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008

FIGURE 5-25
Scatter chart of Average Song Duration from The Whitburn Project.

194

9780170187466

Chapter 5 Data analysis and visualisation

FIGURE 5-26
A Tag cloud (from The Whitburn Project) of words appearing in songs

FIGURE 5-27
Southside Makers raw sales data. The market share percentage is calculated by working
out the total sales for the market and what percentage that stallholder received.

Data visualisations to show


composition of data
The most common use for a data visualisation composition is to show how
sets of data change over time. For example, using the Google public data
website <www.google.com/publicdata/directory>, we can look at an animation
that can show us how carbon dioxide (CO2) emissions have changed over time.
For example, click on World Development Indicators (subset) and
then, under the heading Environment, click on CO2 emissions. You
will next be shown a basic line chart of the worlds contribution to CO2
emissions. By clicking on Australia on the left-hand side, you will
be shown a basic line chart that shows how Australias CO2 emissions
have risen over time (see Figure 5-28). Click on another country, such as
Germany, and you can see a comparison (see Figure 5-29).
9780170187466

As more and more data is becoming


publicly accessible, companies such
as Google have been investing time
and money to create tools to access
and display this data in a visual way.
Googles public data directory
shows a collection of key data sets that
are freely available, and their tools allow
you to create custom animations that can
be used by the public.

Think about IT 5-10


Go to the Googles public data exporter
tool <www.google.com/publicdata/
directory> and try comparing the
following sets of data:
1 fertility rates (births per woman)
versus life expectancy at birth
2 CO2 emissions versus electric
power consumption.
195

Information Technology VCE Units 1&2

FIGURE 5-28
Line chart created in Google public data, showing how CO2 emissions have risen over
time for Australia.

FIGURE 5-29
Line chart created in Google public data, comparing CO2 emissions for Australia and
Germany.

Click on the World map button also on the left-hand side, and you can
see an animation or composition showing how the CO2 emissions for both
countries (and the world) have changed since the 1960s (see Figure 5-30).
Click on the bubble icon to show a chart composition between two data
sets: CO2 emissions and electric power consumption (see Figure 5-31).

Sources of authentic data


Data can be divided up into primary- and secondary-source data. Primarysource data is acquired directly from the source, whereas secondary-source
data is an interpretation of a primary event or primary-source data.
In the case study of Southside Makers, the sources for their authentic
data came from the Australian Bureau of Statistics (ABS) and the visitor
196

9780170187466

Chapter 5 Data analysis and visualisation

FIGURE 5-30
World map composition created in Google public data, showing CO2 emissions
highlighting Australia and Germany.

FIGURE 5-31
Bubble chart composition showing relationship between CO2 and electric power
consumption.

survey results. There is both a large data pool and a quality process in place
for the distribution and collection of surveys and for data entry.
The ABS has a census quality statement that outlines four principal
sources of error in its census data set:
respondent error when the person filling out the survey misunderstands
the question and includes false data
processing error when the technique used to input the data, either
manual entry or automatic entry using optical character recognition
software, into the database makes a mistake
partial response when the person filling out the survey has only
completed part of the survey
undercount when not all of the respondents to the survey have
completed the survey due to a range of reasons.
9780170187466

Sources of error in a data set are


sometimes beyond the control of the
collectors.

197

Information Technology VCE Units 1&2

CASE STUDY:
Southside Makers
When the Southside Makers Committee collect their data, they are
generally dealing with a very small data pool.
If your comparative chart shows that 75 per cent of people who
visited the market were happy with the market, but you only surveyed
12 people, the data pool is probably too small to extract any meaningful
information out of it.

Information economy
With the increase in data generation,
new economic opportunities appear.
InfoChimps <http://infochimps.org/> is
just one of many companies that deal in
the electronic data marketplace, trading
in data sets.

Quality management of the census program aims to reduce error as


much as possible, and to provide a measure of the remaining error to data
users so that they use the data in an informed way.
Several organisations worldwide have quality processes in place to
manage the collection and reporting of authentic data. Additional sources
of authentic data might be:
The World Bank <www.worldbank.org/>: public access
Organisation for Economic Cooperation and Development (OECD) <www.
oecd.org>: click on Statistics for data on OECD countries or on Dynamic
Maps and Charts
CIA World Factbook <https://www.cia.gov/library/publications/the-worldfactbook/>: provides information on 266 countries worldwide
Australian Taxation Office (ATO) <www.ato.gov.au/>: publishes taxation
statistics based on the data that it gathers through personal and company
tax returns each year.

Data sources and methods of acquisition

Data-collection forms are used to


acquire large sets of data.

Think about IT 5-11


Identify three different data-collection
forms that your school uses.

198

Before we can produce information to solve our problems or to help us


make decisions, we must first start with the raw facts (the data). Facts can be
gathered from a number of sources:
Primary-source data are often facts that have been obtained through
measurement, data-collection forms, interviews, direct observation or by
electronic mail. Authentic sources of primary data might have a quality
process in place to minimise errors in the data set.
Secondary-source data is often data gathered from the published work of
someone else. Books, newspapers and magazines are traditional secondary
sources. Electronic sources include online databases, the Internet (such as
blogs or news sites) and reference material recorded onto CD-ROMs.
Acquiring this information might be achieved by either accessing a
public data bank, such as the ABS or the Bureau of Meteorology, or using
a data-collection form. A data-collection form can provide a quick means
of gathering large amounts of data. An example of the largest collection of
data in Australia is the national census that occurs every five years.
There are many datasets that are freely available online for organisations
to use in their decision making. For example:
if a family was looking to move suburbs, they might use the Australian
Census data or Australian Electoral Commission data to choose a suburb
that reflected similar political and social beliefs to theirs.
9780170187466

If an organisation was looking to sponsor a child in a third world country,


they might use the World Health Organisation (WHO) data to identify
which areas of the world need their support and sponsorship.

Qualified health professionals per 1000

Chapter 5 Data analysis and visualisation

1.4

Qualified Health Professionals in Africa

Think about IT 5-12

1.2
1
0.8
Per 1000: Midwives
Per 1000: Nurses
Per 1000: Physicians

0.6
0.4
0.2
0
Eritrea

Ethiopia

Kenya

Somalia

Sudan

Countries bordering Ethiopia

FIGURE 5-32
Using World Health Organisation data we can select appropriate data to assist us in
making a decision.

A survey also uses a data-collection form and provides data about what
the respondents think, such as their preferences in consumer goods or
political parties, what they want from an information system or their role
within an organisation.

If your school sponsors a child


in Africa, use the World Health
Organisations database to find further
information about where they live.

Valerie Browning is an Australian Nurse


who lives and works in Ethiopia with the
Afar people, an indigenous race of people
dating back thousands of years. Using
World Health Organisation data sets we
can see that the area of Africa in which
she lives is under-resourced compared
with surrounding countries and even
Australia.

A survey is used to determine what


people believe is the current situation,
or to provide facts as known to the
respondent.

Surveys were also discussed on


page 122.

FIGURE 5-33
Using the data downloaded from the World Health Organisation site, we hide or delete
the columns of data that we dont want to work with.

When you are planning on acquiring data to produce an information


solution, you need to be mindful of what to expect and what form the data
might come in, especially when you have a handwritten survey.
9780170187466

199

Information Technology VCE Units 1&2

Data integrity

The invention of the barcode in 1973 and


the Universal Product Code (UPC) system
heralded a new era in data integrity. The
last column in the barcode was used as
a check digit to ensure that none of the
other bars had been compromised.

Think about IT 5-13


Can you think of any other systems
that use a similar technique to the
check digit to ensure data integrity?

For a computer to produce useful information, the data that is input into
a database must have integrity. Data integrity is the degree to which data
is correct. A misspelled movie title in a movie database is an example of
incorrect data. When a database contains these types of errors, it loses its
integrity. The more errors the data contains, the lower its integrity. Users
will not rely on data that has little or no integrity.
Data integrity is very important because information is used to make
decisions and take actions. When you order a product, such as a book about
sewing or soccer, a process that eventually delivers you the book begins.
Before you receive the book, the retailer usually charges the order to your
credit card. The retailer will bill an incorrect amount to your credit card if
the books price is not correct in its database. This type of error causes both
you and the retailer extra time and effort to remedy.
Validating data as it is being entered into the database can ensure
that integrity is kept high. When you purchase a product online, you are
generally asked for a three-digit code on the back of the credit card. This
code is used as a check digit to ensure that the credit card information
being typed in is correct.
Throughout the process there are checks and balances introduced to ensure
that the data order is correct and the correct payment information is received.

Measurement
Measurement is the most common technique used for gathering data that
is to be processed into information. Examples include an inventory clerk
who conducts a stocktake by counting each type of stock on the shelves and
entering that number onto an inventory form. The manager at a sportsground
may determine the size of a crowd by counting the number of tickets used by
patrons. A nurse will record a patients weight by reading the measurement
from a set of scales.
Data measurement can also take place electronically. It occurs each time
a shoppers groceries are scanned by a barcode reader at the supermarket
checkout. A school science experiment may be set up so that when the
temperature in a model hothouse reaches 21 degrees Celsius, the heater is
switched off. Other forms of measurement might be electronic, such as hits
on a website and traffic passing through a router.

CASE STUDY:
Southside Makers
A metric is just a measurement that
can be used to evaluate.

200

At the Southside Makers Market the committee might rely on a number


of strategies for creating metrics of their event. Someone might count the
number of people through the door of the Town Hall. After the event is
over, the organisers might ask every stallholder to tell them how many
sales they had for the day and how much they sold. Of course, metrics
are only as good as the source they came from.

9780170187466

Chapter 5 Data analysis and visualisation

Data types and structures relevant


to selected software tools
Many different software tools can be used to create data visualisations;
however, most are derivatives of either a spreadsheet tool or a database tool.
For example, using the online data-visualisation tool Many Eyes <http://
manyeyes.alphaworks.ibm.com/manyeyes/> to generate a chart might require
users to format their data into a data table using column headers that display
the unit of measurement, such as dollars ($) or percentages (%) (Figure 5-34).

FIGURE 5-34
When converting the data from a spreadsheet tool into Many Eyes, the user needs to
remove any trace of the $ or % signs.

Data needs to be separated into distinct fields or columns that only have
one sort of data type in them. Common data types are integer, characters
and strings.

A data type is a set of data with


predefined characteristics.

Integer data types


The integer data type is used to represent whole numbers. A number of
different formats are used to represent integer data (see Figure 5-35). The
format used depends on the size of the numbers that need to be stored. An
example of an integer might be the total number of visitors to the market.

Characters
A character variable or constant holds a single letter, number or symbol. For
example, a common character used in survey is a (Y)es/(N)o/(M)aybe answer.
9780170187466

Data types were also discussed


on page 5.

201

Information Technology VCE Units 1&2

Integer type

Range

Memory used (bytes)

Short

128 to 127

Normal

32 768 to 32 767

Long

2 147 483 648 to 2 147 483 647

Extra-long

263 to 263 1

FIGURE 5-35
Type and range of integer data.

Strings
Strings are sequences of characters. For example, a string might be a comment
left by someone in a survey.

Boolean data
Boolean data can have only two values: true and false. An example of a
Boolean answer to a survey question Will you attend the market again?
might be (Y)es and (N)o.

Decimal numbers
Decimal numbers are those that include a fractional or decimal part. For
example, a survey might ask How much did you spend at the Southside
Makers Market?

Date and time data


Date and time data are not recorded as the actual date, such as 4.39 p.m. on
9 May 1938 or 09/05/1938. Instead, the date is held in the form of an integer
that counts the number of days from a specified reference point, such as 1900,
and time data is stored as a decimal fraction of the day (see Figure 5-8 on
page 183). For example, a date and time might have been recorded when your
attendance was checked off on the way into a market so that a distribution
chart might be created later on to see which times were the busiest. For more
in-depth explanation on data types, refer to Chapter 6.

Think about IT 5-14


Look through the newspaper and
identify a data visualisation that has
been used in a story. Does the data
visualisation accurately represent the
story? What type of visualisation have
they used and could they have used a
different one?

202

Purposes of data visualisations


The main purpose of a data visualisation is to reduce the effort required to
analyse the information (make the communication of the data more efficient)
and increase comprehension, or the effectiveness of the information.
Ultimately, we make decisions with information that is given to us. Below are
some examples of the decisions that data visualisations aid with when using a
website.
A bar or pie chart can provide us with a visualisation that will enable us
to compare the origin of visitors on a website.
9780170187466

Chapter 5 Data analysis and visualisation

The organisation will be able to ask questions such as Are we getting


enough traffic from the desired country? This information might be used to
evaluate an advertising campaign.
A histogram showing at what time each day (or the distribution) people
look at the website will provide the organisation with data that might help
them decide when information needs to be changed on the website. For
example, if the bulk of visitors check the website at morning-tea time, the
organisation will know that any specials or updates need to be done by this
time.
Data showing a distribution of hits both from a website and an
advertisement might indicate a direct relationship between advertising
hits and traffic. A composition created from the distribution of hits over
a number of months and those on an advertisement might show a bigger
picture about the impact of advertising on web traffic.

CASE STUDY:
Southside Makers
There are many possibilities for Southside Makers to use data
visualisation for decision-making:
A comparative chart might show how many customers are finding
their way to the market each month. Is there a trend going up or
down?
A distribution chart might show when the busiest period of the day
is for stallholders. Is it before, during or after lunch?
A scatter chart showing customers and sales might indicate a
relationship and possibly a tipping point for a good or bad market day.
A composition might be created to show the relationship between
time of day and sales (if you could get that data).

DiRT
Digital Research Tools Wiki <http://
digitalresearchtools.pbworks.com/>

is an excellent knowledge repository


for all processes associated with data
visualisation. Among the vast collection
of links are examples of how data
visualisations can be used.

Design tools for representing


data visualisations
A variety of tools can be used to represent the design of a data visualisation:
Layout diagram: A data visualisation needs to show what type of chart
might be used and where the source data might appear; for example, along
the x- or y-axis. Layout diagrams should show colours, headings, axis
labels and legends. They can also show on a map where data might be
shown and where the data source might come from (see Figure 5-36).
Storyboard: Storyboards can be used to show how the data-visualisation
animation might work (see Figure 5-38). For example, if you are using
Flash to show how one data visualisation morphs into another, the
storyboard might show the transitions from one visualisation to another.
Flowchart: Flowcharts can be used to show the procedure that users need
to complete to create a data visualisation.

9780170187466

203

Information Technology VCE Units 1&2

CO2 emissions (metric tons)

Layout diagram for composite data visualisation


20

Data charted for


Germany

15

Data charted for


Australia

10

FIGURE 5-37
Example of data that might
be used for a visualisation
showing carbon dioxide
emissions (metric tons per
capita) over many decades.

5
0
1960

1970

1980

1990
Year

2000

2010

FIGURE 5-36
An example of how you might show a composite data visualisation using a layout
diagram.
Source: OECD data from Google public data.

Think about IT 5-15


Create a flowchart that accurately
shows a procedure for creating a
simple bar chart.

Map of world
Germany
red

14 pt

Australia

20 pt

Germany

changes
red

14 pt

Australia

changes

blue
1960

Map of world

blue

Data changes to match CO2 emissions

2010

FIGURE 5-38
An example of how you might show a composite data visualisation using a storyboard.
Source: OECD data from Google public data.

Characteristics of users

See page 10 for more information on


characteristics of audiences.

204

As outlined in Chapter 1, many characteristics need to be taken into


consideration when creating information, such as gender, special needs,
culture, age, education level, status and location. When creating a data
visualisation, you might therefore want to also consider the following:
Age: Visualisation being shown must be age appropriate. If a visualisation
is being shown to a primary-school class about the distribution of sporting
fields in a municipality, you might use different-sized footballs rather than
bubbles to appeal to a younger audience.
Special needs: Special needs such as sight impairment might change the way
you visualise your data to ensure that people with special needs can read the
axis labels more clearly. Composition visualisations might move a bit more
slowly, or sound might be used to indicate an incline or decline in statistics.
Culture: If you are preparing a visualisation that is to be presented to a
community group whose members are mainly non-English speaking, you
may aim to use more symbols and drawings rather than words in your
presentation.
9780170187466

Chapter 5 Data analysis and visualisation

The main purpose of using a data visualisation over a conventional


description or list of the data is to engage the user so that they can easily
interpret the data that is being given to them.

CASE STUDY:
Southside Makers
In addition to the characteristics of the user, the way the information is
going to be used influences how it is created.
In the case of Southside Makers, they are after information about the
community, which is map based; for example, they need to identify cafs
at which to drop their pamphlets off.
Displaying the information for them in a tabular list will not suit their
purpose. So a map with streets clearly labelled will need to be created.

Think about IT 5-16


Identify a demographic that might
benefit from a data visualisation rather
than from a tabular list.

Evaluation criteria and techniques


Evaluation is an important part of the problem-solving methodology. It allows
us to prove or disprove that our solution has met the needs of users. To do
this, we have to take a look back at our problem statement.
Generally, we can evaluate our data visualisations based on two key
areas: efficiency and effectiveness.

Evaluating the efficiency


of data visualisation
Efficiency deals with quantitative data that can be easily measured:
Time: Has the data visualisation allowed us to save time in processing the
data into information?
Cost: Has the data visualisation decreased the cost associated with
processing the data, be it labour or equipment costs?
Effort: Can further data sets be added to the visualisation at a future date
with minimal effort?

Think about IT 5-17


Identify a strategy to evaluate the
efficiency of a data visualisation; for
example, the data visualisation that
shows late students on page 179.

Evaluating the effectiveness


of data visualisations
Effectiveness deals with qualitative data that is a bit harder to measure.
However, by taking a look at the Southside Makers case study again, we will
notice strategies for evaluating their data visualisations:
Quality: Has the data visualisation provided users with the means to
understand the data quickly? Does it provide new insight into the data?
Relevancy: Does the data visualisation show the relevant data, or do users
need to hunt through the visualisation to find it?
Timeliness: Is the data visualisation being communicated in a timely
manner? Is the data set being used in a timely manner or is it too old?
9780170187466

205

Information Technology VCE Units 1&2

Accuracy: Is the data visualisation accurate? Can we compare a list of data


with the visualisation to ensure its message is accurate?
Clarity: Does the data visualisation provide ambiguous information,
therefore potentially leading to misinterpretation?

Using problem statements to evaluate


data visualisations
When the problem statement is clearly defined, words can be used that will
guide us in how we might evaluate the information or data visualisation
produced (Figure 5-39). Taking a look at the Southside Makers Market
problem statement, one key word is used in describing the problem that
can be used to evaluate the data visualisation: How can Southside Makers
improve the effectiveness of their marketing campaign to increase attendance
at their markets?
When we evaluate the Southside Makers data visualisation against the
problem statement, we should draw on the key word that we have in this
problem statement:
Does the data visualisation improve the quality (effectiveness) of
the marketing campaign by providing information that is easier to
understand?
Is the visualisation more relevant (effective) to the task of improving
the effectiveness of the marketing campaign compared with previous
campaigns?
Is the data visualisation timely (effective)?

Characteristics of file formats


and ease of conversion
More and more websites are using PNG
as a file format. PNG stands for portable
network graphics. It is an improved
version of the GIF file format. JPEG is still
a popular file format.

Think about IT 5-18


Using Chart Chooser, create a pie
chart that shows football teams your
classmates support.

Most of the data-visualisation tools will produce a visualisation that is able to


be saved and then used as part of a presentation or report.
In the case of the Southside Makers Market case study, the
visualisations created at the ABS website, such as Figure 5-11 on page 186,
can be saved as a JPEG by simply right-clicking on the file and then saving
it into an appropriate directory.
Websites that deal with public data sets, such as Google public data
explorer <www.google.com/publicdata/>, use flash to animate their
visualisations. Depending on the browser, there are tools available to enable
users to capture these flash files and save them to their hard drives and
insert them into PowerPoint.
Tools such as Chart Chooser, by Juice Analytics <http://chartchooser.
juiceanalytics.com/>, allow users to download a template into their chosen
application, such as a spreadsheet or presentation tool. They can then alter
the data set to produce a visualisation that meets their needs.

Functions of data visualisation software


A basic spreadsheet tool can provide you with the tool necessary to produce a
static data-visualisation information solution. Basic comparative, distributive
and relationship-based charts can be created using a common spreadsheeting

206

9780170187466

Chapter 5 Data analysis and visualisation

package. However, if you are wanting to create an animated visualisaton of a


data set, you need to register and use some of the more complex tools on the
Internet, including:
Many Eyes <http://manyeyes.alphaworks.ibm.com/manyeyes/> allows
registered users to upload data sets, create complex visualisations and
share them with the world.
Google Chart Tools <http://code.google.com/apis/charttools/> allow users to
create complex animations using a wide range of charts (Figure 5-39), and to
overlay these onto Google maps <http://maps.google.com/> if required.

Sat
Fri
Thr
Wed
Tue
Mon
Sun
12 a.m. 1

10 11 12 a.m. 1

10 11

FIGURE 5-39
This punch-card scatter chart is just one example of the user-submitted charts available
at Google Chart Tools.

Any Chart <www.anychart.com> allows users to play around with


XML code to customise their charts. Although this product is not free,
registered users can download a demonstration to use.

Formats and conventions applied to


visualisations
A good data visualisation tells one story. This story should be easy to
understand, clear to identify and without ambiguity. The formats and
conventions that we use should support this. If you remember back to Chapter 1,
a format and convention can be used to enhance the appearance of information
and make it more readable. Formats and conventions used in data visualisations
are similar to what you would use for most other solutions.

Think about IT 5-19


Using Google public data from OECD
countries, create a visualisation
comparing two sets of data.

Formats for data visualisations


A format helps to change the appearance of a piece of information using
features such as font, margins, spacing, columns, tables, graphics, borders,
pages numbers and headers and footers. A good data visualisation should use
formats that increase the effectiveness of the information. These might include:
easy-to-read fonts for headings
a chart format that clearly shows the information being communicated
colour gradients that are easy to distinguish.
For example, using a bubble chart overlay over a map might not be the best
format for communicating information on the number of bars in the United
States (see Figure 5-40). The use of colour variants might have been a better
method for communicating which state has the most drinking establishments.
9780170187466

207

Information Technology VCE Units 1&2

Think about IT 5-20


Using data retrieved from one of the
large data providers (for example, WHO,
World Bank or Australian Census),
create an example of a good and a bad
visualisation. Outline the good and bad
features of each.

Fewer References
to Bars

More References
to Bars

BARS: Size of symbol represents absolute number of mentions of bars in the Google Maps directory. A maximum value of 1139 is
located in Chicago, IL. Data collected in August 2008.
Copyleft Floatingsheep.org, 2010

FIGURE 5-40
This data visualisation is overcrowded and too busy to get any meaningful information
from it.

Junk Charts <http://junkcharts.


typepad.com/> is an interesting blog
that looks at bad and confusing charts
and how they try to communicate
information a great source of data
visualisations gone wrong.

Think about IT 5-21


Visit the Junk Charts website and
select a chart. Identify two ways in
which formats and conventions have
not been correctly applied.

Conventions for data visualisations


A convention is a well-known rule used to communicate information. The most
popular example is the way an envelope is addressed. The convention is to put
the persons name, address, suburb, state, postcode and then country, if necessary.
Conventions for data visualisations include:
clearly identifying the data source at the footer of the visualisation
providing a heading, stating the purpose of the visualisation
inserting labels that show what the data is and what unit of measurement
is being used
inserting a footer showing who has created the chart and for whom.
What is the
visualisation doing?

Axis headings
must be clear

Source of the
data for the
visualisation
and when the
data was
retrieved

Qualified health professionals per 1000

Junk Charts blog

Qualified Health Professionals in Africa


1.4
1.2
1
0.8
Per 1000: Midwives
Per 1000: Nurses
Per 1000: Physicians

0.6
0.4
0.2
0

Eritrea

Ethiopia
Kenya
Somalia
Countries bordering Ethiopia

Sudan

Source: World Health Organisation data:http://www.who.int/ 12 Sept 2010


Created by: Sally Bloggs 20 Sept 2010
Author clearly identified and
when the visualisation was
created

FIGURE 5-41
An annotated diagram showing which conventions have been used in the visualisation.
208

9780170187466

Chapter 5 Data analysis and visualisation

What you should know


1 The problem-solving methodology is used as a
framework for solving information problems.
2 An information problem occurs when
an information system does not produce
the information required for a user or an
organisation.
3 There are three different ways in which
information can be presented or communicated:
written form, such as a ten-page report; oral,
such as a voicemail or a broadcast; and visually,
such as a chart or animation.
4 When looking at a distribution of data, we look
for changes over a period of time.
5 A data visualisation is when data is presented
in a visual format so that it has greater meaning
or where understanding can be reached in a
shorter period of time.
6 When processing data into information, there are
four basic types of data visualisations: comparing
sets of data, showing a distribution, showing a
relationship, and composing an animation that
shows how the data changes over time.
7 When analysing an information problem, we
first need to define the scope, information
requirements and information constraints.
8 A project constraint is often defined by the time
and the resources available.
9 Quantitative data is measurable and specific.
10 Qualitative data is harder to measure and
often deals with peoples opinions and feelings
towards something.
11 Quantitative data can be gathered through
surveys (handwritten and online) and statistics.
12 Qualitative data can be gathered through
interviews and questionnaires (handwritten and
online).
13 A CSV file is a plain text file that is specifically
formatted to store spreadsheet- or database-style
information.
14 Demographic is data that deals with the
characteristics of a group of people.
15 A text cloud is an effective way of
communicating qualitative data as it highlights
key words and how often they have been used.
16 Data visualisations can be used either as a
product resulting from the problem-solving
methodology or as a tool to further analyse a
problem.
9780170187466

17 A data visualisation ensures that complex


data is communicated in a way that makes
understanding it easier.
18 A data set is a collection of data; for example,
census data collected every five years might be
regarded as a data set.
19 A constraint is any factor that affects the
nature of a solution, such as time or resources
available.
20 A limitation might involve technical, economic
or aesthetic limitations.
21 Charts such as pie, column or bar can be used as
data visualisations to compare two or more sets
of data.
22 A data visualisation showing a distribution of
data might be communicated using a scatter
chart, line histogram or column histogram.
23 A histogram shows how data has changed over a
timeframe; for example, hours, days, months and
years.
24 A scatter chart can be used to show a
distribution of data against a fixed unit of time.
Complex scatter charts can show a comparison
of data and therefore identify a relationship
between them.
25 A data visualisation showing a relationship
between two sets of data might show this
relationship using a scatter or bubble chart.
26 A bubble chart displays one set of data as
bubbles, which are relative to their values.
27 A data visualisation showing a composition
of data might be communicated using an
animation that is, morphing one visualisation
into another.
28 Data falls into two different types: primary
and secondary source. Primary-source data is
data acquired directly from the source, whereas
secondary-source data is an interpretation of the
original data.
29 When collecting data, there are four sources
of error that might occur: respondent error,
processing error, partial response and
undercount.
30 Respondent error is when a data-collection form
has been filled out incorrectly.
31 Processing error occurs when there is an error
with manual or automatic data entry into the
system.
209

Information Technology VCE Units 1&2

32 Partial response is when only half of the datacollection form is filled out.
33 Undercount is when not all respondents have
filled out the data-collection form, therefore
making the data set quite small.
34 Data-collection forms are used to acquire
large sets of data. This can be done manually
or electronically. A common method for data
collection is a survey.
35 Data integrity is the degree to which data is
correct. Validating data as it is entered into the
database can keep integrity high.
36 Data measurement, or metrics, is a simple way
of collecting data. This deals with measuring
fixed items such as attendance, temperature,
speed or stock levels.
37 A data type is a way we categorise data. Basic
data types include integer, character, strings,
Boolean data, decimal numbers and date and
time numbers.
38 The purpose of a data visualisation is to reduce
the effort required to analyse information and
increase comprehension.
39 We can use a number of design tools for
representing data visualisations: layout
diagrams, storyboards and flowcharts.
40 A layout diagram can show what type of chart
we might use, and indicate where the source
data, axis labels and headings come from.
41 A storyboard can be used to show how a
data visualisation animation might work; for
example, from data set A to data set B.
42 A flowchart might be used to show the process
or procedure that the user needs to go through to
create a visualisation.
43 Characteristics of users that might influence the
type and presentations of visualisation might be
age, special needs and culture.
44 Evaluating data visualisations might involve
assessing whether the visualisation is efficient
(less time, less cost and less effort).
45 Evaluating data visualisations might involve
assessing whether the visualisations are effective
(higher quality, relevancy, timeliness, accuracy
and clarity).
46 When we evaluate a data visualisation against a
problem statement, we look for key words that
can assist us in the evaluation statements.
210

47 Most data visualisations can be saved into


JPEG or PNG file formats. Animated data
visualisations can be saved as a flash file format.
48 Software tools used to create data visualisations
range from spreadsheets to bespoke online tools.
49 Data visualisations should be formatted clearly
to show easy-to-read headings, a clear chart
format and colour gradients that are easy to
distinguish.
50 Data visualisations have the following
conventions applied to them: headings and axis
labels, clear identification of data source, who
the visualisation is created for, and who created
the chart.

9780170187466

Chapter 5 Data analysis and visualisation

Test your knowledge


Revising the problem-solving methodology
1 What is an information problem?
2 What are the three different ways information
can be presented?
3 What is a data set?
4 What are the two factors that data visualisation
can increase?

Presenting information in a visual form


5 What is a data visualisation?
6 Why might we favour a data visualisation over
other ways of presenting information?
7 What are the four different ways you can present
information in a visual form?
8 What is the difference between a visualisation
that compares and visualisation that shows
relationships?
9 Give an example of a visualisation that would
show a distribution of data.

Analysing information problems


10 What are the three factors that should be taken
into consideration when analysing a problem?
11 Identify three common constraints that might
exist when analysing an information problem.
12 How might a set of user requirements change the
nature of an information solution?
13 What is a scope?
14 What is the difference between quantitative and
qualitative data? Give examples.
15 What strategies might we use when gathering
quantitative data?
16 What data visualisations can we use to analyse
qualitative data?
17 What is a thematic map?
18 Identity three constraints that might influence
the nature of a solution.
19 Why is it important to consider the limitations
of a solution when designing a solution?

9780170187466

www.nelsonnet.com.au/infotech

Types of data visualisations


20 How many chart types do you know? List and
give an example of how they might be used.
21 How does a histogram differ from a line chart?
22 What visualisation can we use to demonstrate
what might happen over a given timeframe?
23 How might a scatter chart be used?
24 How does a bubble chart differ from a scatter
chart?
25 If we had two sets of data and we needed to
show how the data changed over time, what type
of visualisation might we create?

Sources of authentic data


26 What are the four types of errors that can occur
when acquiring data for a visualisation?
27 Why would it be important to identify if a data
set has an undercount?
28 What is the difference between primary- and
secondary-source data?
29 Why is it important to identify the source of a
data visualisation?
30 What is the difference between a survey and a
data-collection form?
31 How does data integrity make a difference to the
visualisations you produce?
32 Give three examples of data measurement.

Data types and structures relevant


to selected software tools
33 When creating a data visualisation, why is it
important to define the data type that you are using?
34 What is a CSV file?

Purpose of a data visualisation


35 How might a design assist you in clarifying the
purpose of a data visualisation?
36 Draw a flowchart that explains the process that
you need to go through to create a thematic
map from the Australian Bureau of Statistics
website.
37 Create a storyboard that shows how a data
visualisation might work.

211

Information Technology VCE Units 1&2

Characteristics that can influence type


and presentation
38 How might age impact on the design of a data
visualisation?
39 Provide an example of how culture might
change the form of data visualisation.
40 Identify a special need that might change the
format of a data visualisation.

Criteria and techniques for evaluating a


data visualisation
41 Provide an example of how you might evaluate a
data visualisation to demonstrate efficiency.
42 Create three questions that would test for clarity
in a data visualisation.
43 How can we test for accuracy when creating a
data visualisation?

Characteristics of file formats


44 A member of your team has asked you for a copy
of a data visualisation that appeared in a Word
document. What file format would you send it in?
45 Into what file formats can we save data
visualisation animations?

Functions of appropriate software tools


46 Only two columns of data need to be used to
create a data visualisation for a presentation.
Using a flowchart, explain how you might select
the required data for the data visualisation
required.

Formats and conventions


47 Identify three formats that each data
visualisation should demonstrate to ensure
clarity.
48 Identify and explain three conventions that
every data visualisation should show to ensure
relevancy.

212

9780170187466

Chapter 5 Data analysis and visualisation

Apply your knowledge


1
2

Devise a list of five questions that you could use


to test users need for a visual solution.
A local shopkeeper has a laser tracker on
the door of their shop. It beeps every time a
customer walks through the door to the shop.
Two beeps equals one customer in the store. At
the end of the week, the shopkeeper downloads
the data from the laser tracker and then spends
lots of time looking through the data for patterns.
a What might this information be used for?
b What form of data visualisation might this
shopkeeper use to make meaning out of the
data set that he has access to?
c What other data sets might be readily available
to this shopkeeper?
d What types of relationships can this
shopkeeper create to make business decisions?
Find and locate a bad data visualisation.
Annotate a printout of this visualisation to
demonstrate the changes that you would make to
both the format and conventions to ensure that
the information solution has both clarity and
relevancy.
For years your school has sponsored a child in
Sudan. The sponsorship of that child has come
to the end and your school needs to choose
another sponsor child. Your school would like
you to produce a series of data visualisations to
assist them with making the decision on which
country to sponsor a child in.
a Go to a child sponsorship web site such
as the World Vision web site (http://www.
worldvision.org/) and locate three potential
sponsorship children in three different third
world countries. Do some research on which
areas of education or health does a sponsorship
program assist with?
b Go to the World Bank Data web site (http://
data.worldbank.org/) and retrieve statistics on
Education and Health for your three countries.
Which statistics would allow us to make a
decision on which child to sponsor? How do
these statistics compare with Australia?

9780170187466

www.nelsonnet.com.au/infotech

c Present a compelling presentation with


appropriate data visualisations using either a
spreadsheet charting tool, animation software
or one of the many visualisation web sites
mentioned in the chapter to argue in favour of
the sponsorship of a child in one of the three
countries you have researched.

213

Information Technology VCE Units 1&2

Preparing for Unit 2 Outcome 1


On completion of this unit the students should be able
to apply the problem-solving methodology and use
appropriate software tools to create data visualisations
that meet users needs.
In Unit 2 Outcome 1, you are required to produce a
data visualisation that meets the need of a user. This
will involve using a complex data set and accessing
software or online tools that will enable you to convert
this data into a more meaningful format.

Learning milestones
1
2

3
4
5
6
7
8

Familiarise yourself with the design brief and


analyse the information problem that exists.
Determine the type of data visualisation that
might be needed to solve the information
problem.
Select appropriate sources of data and identify
relevant data.
Determine the suitability of different data types
and structures for creating visualisations.
Select types of visualisations that are
appropriate to the data.
Select and apply appropriate tools to plan the
design of the visualisations.
Apply software functions to locate and acquire
data that will be input and manipulated.
Use appropriate software tools, and select and
apply a range of suitable functions.

Steps to be followed
The analysis, design, development and evaluation
stages of the problem-solving methodology will be
used to create the solution.
1 Analyse and define the problem. From the
design brief, identify:
a the factors that affect the problem
b the data visualisation needs of the user
c the data visualisation product to be produced
to meet the users needs
d appropriate sources of data that must be
processed to produce the solution.
The problem to be solved is best written as a
question. For instance, identify a pattern in the late
attendance of students across a one-month period.

214

Design the solution. This includes:


a use of techniques such as flowcharts,
storyboards or layout diagrams for all aspects of
the output, both on screen and in printed reports
b a decision about which software and hardware
are most suitable for producing the data
visualisation
c formats and conventions that will be used in
the data visualisation
d selection of the test data that will be used
to conduct tests to check that the data
visualisation is producing the information that
is required.
3 Develop a data visualisation that can be viewed
either on screen or in hard copy, depending on
the needs of the user.
a Produce a clear and concise data visualisation.
b Show the source of the data that you are using.
c Ensure that you have correct headings, axis
labels and measurements to ensure clarity in
the data visualisation.
4 Test the solution. Detail how you have checked
that the output produced is working as expected.
Test data selected in the design stage will be
used.
a Test that the source data is displayed correctly
in the data visualisation.
b Test the clarity of the data visualisation.
c Test the relevancy of the data visualisation.
You will need to provide testing tables, showing
functions of the data visualisation software tested,
results that are expected using the data and the actual
result that was observed when using the software to
produce the data visualisation. For example, use a
testing table similar to the one shown below.
5 Evaluate the solution.
Prepare an evaluation feedback form and gain
feedback from the known audience.
6 Discuss the likely impact on the ability of the
user to make clear decisions based on how
information is communicated through the
data visualisation. For instance, identify the
decisions the user will be able to make with the
data visualisation.

9780170187466

You might also like