You are on page 1of 28

E s s e n t i a l f o r s o f t wa r e t e s t e r s

TE TER
SUBSCRIBE
Its FREE
for testers
August 2014 v2.0 number 28 4 5 /
Including articles by:
Sakis Ladopoulos
Intrasoft International
Sakis Ladopoulos
Intrasoft International
Sakis Ladopoulos
Intrasoft International
Sakis Ladopoulos
Intrasoft International
This issue of
Professional Tester
is sponsored by
Including articles by:
Henrik Rexed
Neotys
Gregory Solovey and
Anca Iorgulescu
Alcatel-Lucent
Nick Mayes
PAC UK
Normand Glaude
Protecode
Edwin van Vliet
Suprida
Sakis Ladopoulos
Intrasoft International
3 PT - August 2014 - prof essi onal tester. com
From the editor
Keep testing in the lead
Is software testing as important as
software requirements engineering,
design, programming or project
management?
No. It is much more important than
all of these. Testing should be the
framework upon which they are
built. Their only aims should be frst
to facilitate the creation of all the
right tests, then to ensure they all
pass. Testing is not a safety net to
rescue the incompetent from failure:
it is the systematic, generic method
applied throughout the lifecycle and
by which success is achieved.
This issue is made possible by our
sponsor Neotys: a performance
testing tool vendor committed to
testing principles. We recommend
readers evaluate its product Neoload
which is available in a completely
free edition limited only in the
number of virtual users it simulates.
If you are one of the tens of thousands
of testers who like PT, please help us to
keep it free to read by considering the
offerings of those who support it, and
letting them know you appreciate that.
Edward Bishop
Editor
IN THIS ISSUE
Testing in the lead
4 Model answers
Testing-led specifcation with Henrik Rexed
9 QA of testing
Testing-led process with Gregory Solovey and Anca Iorgulescu
13 Get ready for testing in the front line
Testing-led digital transformation strategy with Nick Mayes
19 Zero tolerance
Testing-led project management with Sakis Ladopoulos
21 Open but hidden
Testing-led third-party code management with Normand Glaude
24 Forget me not
Testing-led test data management with Edwin van Vliet
Visit professionaltester.com for the latest news and commentary
E s s e n t i a l f o r s o f t wa r e t e s t e r s
TE TER
SUBSCRIBE
Its FREE
for testers
August 2014 v2.0 number 28 4 5 /
Including articles by:
Sakis Ladopoulos
Intrasoft International
Sakis Ladopoulos
Intrasoft International
Sakis Ladopoulos
Intrasoft International
Sakis Ladopoulos
Intrasoft International
This issue of
Professional Tester
is sponsored by
Contact
Editor
Edward Bishop
editor@professionaltester.com
Managing Director
Niels Valkering
ops@professionaltester.com
Art Director
Christiaan van Heest
art@professionaltester.com
Sales
Rikkert van Erp
advertise@professionaltester.com
Publisher
Jerome H. Mol
publisher@professionaltester.com
Subscriptions
subscribe@professionaltester.com
Contributors to this issue
Henrik Rexed
Gregory Solovey
Anca Iorgulescu
Nick Mayes
Sakis Ladopoulos
Normand Glaude
Edwin van Vliet
Professional Tester is published
by Professional Tester Inc
We aim to promote editorial independence
and free debate: views expressed by contri-
butors are not necessarily those of the editor
nor of the proprietors.
Professional Tester Inc 2014
All rights reserved. No part of this publication
may be reproduced in any form without prior
written permission. Professional Tester is a
trademark of Professional Tester Inc.
4 PT - August 2014 - prof essi onal tester. com
Testing in the lead
Performance testing is often done in
a way that is contrary to the princi-
ples of testing. An application is put
under arbitrary load and its response
times measured. Measurements that
seem large, relative to one another or
compared with arbitrary expectations,
are investigated and addressed and the
same test is run again to demonstrate
that the change has been effective.
But how do we know the right things are
being measured under the right condi-
tions? If not there may have been no need
for the changes. In fact the changes may
well have worsened, or even caused,
performance issues that will matter in pro-
duction but have been missed by testing.
Just as functional testing has no meaning
without trusted requirements, perfor-
mance testing can do nothing to provide
assurance unless what needs to be
assured has been defned formally before
testing starts. In both kinds of testing,
the requirements will change, for many
reasons including the light of test results:
but adequate investment in specifying the
right requirements means that testing can
provide a clear result (requirements are
met, or not: that is, the application will or
will not fail in production) and the closer to
right frst time those specifcations are, the
more empowered is testing to save mas-
sive effort in management, operations and
development and the less testing costs.
When an application passes performance
testing then fails in production, proving the
testing to have been unrealistic, it is easy
but wrong to blame the testing itself or the
tools used to execute it. The real problem
is test design without correct basis. It is
necessary to ask what did we need to
know which, if we had known it, would
have allowed us to predict this failure
before production?. In other words: for
what should we have been testing?
This article will attempt to provide a
generic answer to that question by
defning a model minimum set of perfor-
mance specifcations suitable for use as
test basis and explaining how to
Model answers
by Henrik Rexed
Henrik Rexed
explains how to get
the numbers nearly
right frst time
Specifying accurate and testable
performance requirements
5 PT - August 2014 - prof essi onal tester. com
Testing in the lead
estimate accurate quantitative informa-
tion, working from high-level business
requirements, to populate that model
set. The availability of these accurate
estimates gives performance testing,
regardless of how it is carried out, far
greater chance of success.
Important user transactions
The method described here is for online
user-facing applications and specifes
performance from the users point of
view, by considering the time between
action and response. Each different
action available to the user (that is, the
user transactions that, individually or in
sequences, provide the functions of the
application) must be treated separately.
Many of them will require similar perfor-
mance and can be specifed in groups,
but these are the least important, ie the
least likely to cause performance or
reliability failure. Attempting to specify
the more important ones as a group
will result in very inaccurate fgures for
most of them.
Which are the important user transac-
tions to specify is usually obvious
from a high-level understanding of the
application: those user transactions that
require complex data transactions or
processing, especially with or by sub-
systems. However it is not always so
simple for applications which perform
transactions not triggered, or not imme-
diately, by single or specifc user action:
ie push rather than pull transactions.
Here a more detailed understanding of
the technical design might be needed.
Application and transaction capacity
The capacity of an application is a set
of values defning the maximum number
of simultaneous users to whom it must
deliver the performance levels (whose
specifcation will be discussed below).
These values are:
session initiations per unit time (eg
hour). This is the rate at which users are
expected to begin interacting with the
application and refects the expected
busyness or traffc
concurrent users. This is related to
session initiations but also takes into
account the expected length of use
sessions, that is the rate at which users
stop interacting with the application
user accounts. For applications
which require users to be registered
and to log in, this fgure is the ceiling
of the previous one: the maximum
number of users who could be
interacting with the application at any
time. However many applications offer
certain functions to unregistered users
too. In this case the maximum number
of user accounts (and where applica-
ble the maximum number of accounts
of each of the different types) is used
to estimate the next fgure
concurrent user transactions (for
each of the user transactions).
Obviously the correct entity to defne
required capacity is the acquirer, ie the
business. Sometimes this is easy: for
example where an application or user
transaction is available to a limited group
of employees, associates or customers
whose growth is reasonably predictable.
For a public-facing application that aims
to increase its usership, the key source
of information is the business plan. This
must contain estimates of ROI and there-
fore of market share, turnover, average
sale price or similar which can be used
to derive the expected usership and so
the necessary capacity. Moreover, these
estimates will tend toward the optimistic,
which reduces the risk of performing test-
ing under insuffcient load.
Unless the usership of the application
or a user transaction is truly closed so
the maximum usership is known with
good accuracy, capacity fgures should
be derived from the estimated maximum
usership fgures multiplied by 3. This is
to assure against peak loads which may
occur for multiple reasons, including:
recovery: if the application becomes
unavailable for functional or
6 PT - August 2014 - prof essi onal tester. com
Testing in the lead
non-functional reasons (including tran-
sient conditions), usership is multiplied
when it becomes available again
coincidence/external/unknown:
sometimes demand reaches high
peaks for no predictable reason.
Anyone who has had a job serving the
public will recognize this phenomenon,
for example every supermarket worker
has experienced the shop suddenly
becoming very busy in the middle of
Tuesday afternoon, usually the quietest
time. We need not be concerned here
with its causes, but it is interesting to
note that an external reason is not nec-
essarily required: it can be explained
purely mathematically
transient conditions: network-layer
events such as latency, packet loss,
packets arriving out of order etc
increase effective load by prevent-
ing threads from completing, holding
database connections open, flling
queues and caches and complicating
system resource management. The
same effects can also be caused by
application or system housekeeping
events such as web or memory cache
management, backup, reconciliation
etc. While the second group of events,
unlike the frst, is to some extent
predictable and controllable, as we
have already seen the production load
is unpredictable, so we must assume
that all can happen at the same time
as peaks in production load.
Applying the 3 times rule should miti-
gate these risks suffciently but a greater
one remains: unexpected popularity. If
testing is based on expected capacity but
then usership expands much faster than
expected. As well as failing to serve the
new users, perhaps losing a once-in-a-life-
time business opportunity, their presence
may well cause the application to fail to
serve existing customers who may be
even more valuable.
If such an event is considered possible
and cost is no object, one would build the
application and provision infrastructure
to be capable of passing performance
testing at extremely high load. Realistically
however it must be mitigated by scal-
ability testing, that is testing to assure
against failing to increase the applications
capacity economically when needed. For
performance testing purposes we need
to know the highest currently-expected
capacity and every effort should be made
to help the business to estimate this accu-
rately and commit to those estimates.
Unfortunately many testers still fnd this
impossible to accomplish and are forced
to revert to an empirical method. This is
possible only if the application is already
in production or available for testing in a
reasonably production-like environment.
A preliminary load test is performed on
the application using a simple virtual user
script such as a typical visit or just random
navigation (without infrequently-occurring
user transactions). The same exercise is
repeated on the important user transac-
tions, again using a simple script which
simply carries out a typical transaction
without variation or mistakes. In all cases,
the number of VUs is increased slowly
until the performance requirements (dis-
cussed below) are not being met for that
is, too-slow responses are being expe-
rienced by 25% of them. This is taken
to be the point at which the application is
reaching its capacity.
Obviously this fgure is arbitrary and the
method far from satisfactory. However it
does stand a reasonable chance of being
suffciently accurate, used as a start-
ing point for test design, to at least help
limit waste. In fact, if the organizational,
technical and project situation makes it
cheap, it may well be worth doing even if
accurate estimates based on the busi-
ness plan are available, as a check which
might show that the performance required
by that plan is unfeasible so testing
should not begin. Note that, because of
the simplicity of the scripts, it must not be
used the other way round: that is, do not
say the preliminary testing shows the
current performance specifcations can be
met easily, so the business can consider
making them more ambitious.
7 PT - August 2014 - prof essi onal tester. com
Testing in the lead
The worst situation of all is having to begin
test design based on the testers own esti-
mates of needed capacity, or to proceed to
test execution with no clear idea of these
and having to fnd them out empirically.
Only very good luck can prevent this lead-
ing to great inaccuracy, delay and waste
in the testing, development and, usually,
business effort.
User behaviour
Unusual action sequences, making many
mistakes, breaking out of user transac-
tions before they are completed and rapid
repetition of actions have the effect of
increasing the load caused by one user
to that which would be caused by more
than one normally behaving user. Long
pauses between actions (think times)
increase concurrency and can also cause
sudden changes in total load as actions of
users working at different speeds become
and cease to be simultaneous.
It is futile at the specifcation stage to try
and defne a normal user. What test
design needs to know is the required
range of speed (average pause time)
and accuracy (tendency not to make
mistakes). Note that in modern responsive
web and mobile applications, pause times
are shorter than for older, less responsive
web applications: do not consider the
time between form submissions, but that
between clicks and keypresses on a form
which may interact with the user after
each of them via AJAX etc.
Now for the target audience, decide how
much longer this time may be for the slow-
est user compared with the average user,
and how much faster for the fastest user.
Taking the time for the average user to be
0 arbitrary units, the range can now be
expressed as, for example, -0.5 to +0.7.
The same approach is used for the ten-
dency of users to make mistakes, where
a user who makes the average number
of mistakes per n actions is taken as 0
arbitrary units.
It is also vital to realise that no real user
works at a constant rate or accuracy, due
to being affected by interruptions, distrac-
tions etc. In the most extreme case, VUs
must vary from the lowest to the highest
points on the defned range within a single
user journey.
Deciding the distribution of VUs of these
different characteristics and how they
should vary dynamically is the job of test
design. In practice, the average speed
user will be taken as working at the fastest
speed the load injector can achieve. In
the same way, initial VU scripts tend to
assume no user errors. Both are fne,
provided the necessary variation is then
added. Not doing that is like trying to
check the structural soundness of a build-
ing without going in but just by hammering
on the front door.
Specifying the range of variation for which
assurance against failure must be pro-
vided enables test design to fnd creative
ways to do so. Without that specifcation,
testing is degraded to guesswork.
Load levels and response times
Once the capacity fgures are known, load
levels based upon them are defned: No
load (less than 40% of capacity), low
load (40-80%) and high load (80-90%).
There is no need to deal with higher loads:
any exact load, including 100%, is practi-
cally impossible to achieve accurately
by any means, therefore any test result
obtained by an unknown approximation of
it is meaningless. For the same reason,
in testing it is best to aim for the middle
of the band, that is 20%, 60% and 85%
load. It might well be desired to apply load
above capacity but there is no need to
specify performance at that load; the aim
is to predict other types of failure and their
likely or possible impact. In estimating the
capacity, we assume that performance
failure will occur at loads higher than it.
For each of the important user transac-
tions, and for each of the three load
levels, maximum and average response
times are decided. It must also be
decided whether these should or should
not include network transport times: in
other words, whether the time should be
8 PT - August 2014 - prof essi onal tester. com
Testing in the lead
measured from the requestors point of
view, from the time the request is made
(eg user touches button) to the time the
response is fully displayed, or from the
responders, from the time the request
is fully received to the time the response
has been fully sent.
There are arguments for both
approaches. The transient network con-
ditions are unknown and beyond control;
including them in the specifcation neces-
sarily includes error in the testing. Even
if conditions are fairly stable, entropic
events such as packet loss will always
cause some VUs to report performance
failure. On the other hand, these factors
all affect the user experience, which
is the most important consideration.
Moreover, that consideration is really
the only way to decide what are the
desirable and tolerable response times.
Beware of oversimplifed statements
along the lines x% of users abandon
after y seconds delay. It is necessary to
consider the sequence of actions taken
by the user before the transaction and
the amount and nature of the data to be
displayed after it, especially when errors
occur (such as the submission of incom-
plete or invalid data). All of these affect
the users expectations and behaviour.
There is one special case where the
response time must be specifed as
purely the time taken by the responder:
when that responder (a component such
as a database or mainframe) is respond-
ing to more than one requestor. A typical
example of this situation would be both
web and mobile applications connected
to the same external subsystem. While
empirical testing including both working
simultaneously may well be desirable
and is made possible by modern devel-
opments in tools and environment
provision, that will happen far too late to
discover that a key component on which
both depend cannot provide adequate
performance. Testing of each requestor
must assure that the response time
under maximum load of that requestor is
at most that needed to meet the user-
experienced performance specifcations,
divided by the number of requestors
which will be connected to the responder
in production. This response time is the
value that should be specifed. Note that
in almost all cases it will not be necessary
to delve into system architecture: the only
important point is that at which requests
from multiple requestors converge.
Subcomponents beyond this point can be
considered as a single component.
The minimum set of
performance specifcations
In summary, we have defned the follow-
ing values. For the application overall:
maximum session initiations per unit
time, maximum concurrent users and
maximum number of user accounts. For
the users: relative range of variation of
time between actions and rate of making
mistakes. Then for every important user
transaction: maximum concurrent users
carrying it out, maximum response time
and minimum response time. Setting
these values before test design begins
gives performance testing clear targets
and makes it more quickly effective and
less wasteful. A template to record these
values is shown in fgure 1. When all the
symbols have been replaced by quan-
titative values, minimum performance
specifcation is complete
Henrik Rexed is a performance specialist at Neotys (http://neotys.com)
Symbol Definition Note
SIR maximum session initiation
rate per second
of entire application
CU maximum number of
concurrent users
UA maximum number of user
accounts
if applicable
UPmax maximum user pause (time
between actions, seconds)
relative to average
UPmin minimum user pause (time
between actions, seconds)
UMmax maximum number of user
mistakes per n actions
UMmin minimum number of user
mistakes per n actions
SCRTmax maximum response time of
shared component connected
to this and other apps (sec)
total number of apps
connected to it (seconds)
not including network
transport time. For
complex applications
there may be more
than one such
responder component
IUT number of identified important
user transactions
RTmax[n]

maximum response time of


user transaction n (seconds)
for n = 1 to IUT
RTmin[n]

minimum response time of


user transaction n (seconds)
Figure 1: performance specifcation template
9 PT - August 2014 - prof essi onal tester. com
Testing in the lead
Insuffcient code coverage which does
not increase with each release means
there are test cases that should be in
the automated regression suite but are
not. This happens, despite the pres-
ence of an established testing process,
because the problem is identifed too
late to do anything about it: coverage
becomes visible only at the end of a
release, when the full planned automa-
tion is delivered. During the release, low
coverage is blamed on instability of the
product code it is diffcult to automate
testing when code keeps changing but
when the release is done, all resources
move to the next one so there is often
no opportunity to improve coverage even
when the code has stopped changing.
In this article we present our
approach to detecting and resolv-
ing test-related inconsistencies
immediately as they appear,
throughout the software lifecycle,
using auditing.
Coverage doesnt cover it
Coverage, of code or requirements,
is a fairly poor indicator of test qual-
ity. Consider the requirement If the
processor occupancy reaches 80%
or the memory usage reaches 70%
during the boot sequence an over-
load warning should be generated.
As written, it could be covered with
one test case, which clearly would
not detect many possible defects.
QA of testing
by Gregory Solovey and Anca Iorgulescu
Gregory Solovey
and Anca Iorgulescu
present their audit-
based approach
Replace retrospective test assessment
with real-time test monitoring
10 PT - August 2014 - prof essi onal tester. com
Testing in the lead
Now assume the requirement is imple-
mented by this code:
IF ((CPUOccupancy > 80% OR
MemUsage > 70%) AND State eq
boot))
{ sendWarning ( Overload
); }
The most commonly used code cover-
age criterion, decision coverage, can be
achieved with two obvious tests:
CPUOccupancy = 80%; MemUsage
= 69%; State eq boot
CPUOccupancy = 81%; MemUsage
= 70%; State eq boot
Neither of these would detect the defects
in the following incorrect versions of the
frst line:
IF ((CPUOccupancy > 80% OR
MemUsage = 70%) AND State eq
boot))
IF ((CPUOccupancy > 80% OR
MemUsage > 70%) OR State eq
boot))
Test completeness can only be achieved
by using the right test design methods for
all requirements.
Accepting that fact as a starting point, we
can envisage a system that will detect all
possible defects, shown in fgure 1.
A series of audits, triggered after each
lifecycle phase, can ensure that the test
process meets the predefned guidelines.
Critical test quality gates
We check the quality of testing at the
following reviews:
requirements/architecture
documents review
design documents review
test plans review
test code review
test coverage review (for each
build released).
At each of these, the following series of
steps is taken:
generate metric(s) and upload them into
the appropriate database
perform audit to verify the new metrics
values against prevailing standards
communicate audit results to everyone
with responsibility for this quality gate.
The audit detects any test degradation and
everyone interested is notifed immediately.
Metrics for requirements/
architecture documents
Typically, these reviews focus on making
sure that the customer requests are
properly represented. Test aspects can
be stressed insuffciently. However, the
following two metrics are essential and
must always be taken:
Traceability, the extent to which require-
ments have unique ID for cross reference
with test cases, results and incidents.
Testability, the extent to which test
cases meet the controllability and
observability principles. To be counted,
a test case must be executable (control-
lable) and the result of execution must
be obtainable from an external interface
(observable). If some requirements are
not testable the requirements document
should identify an alternative means to
test them, for example test harnesses.
Metrics for design documents
Typically, a design review aims to verify
the adequacy of the transformation of the
business logic into the implementation
details. From a test quality perspective, a
design document review has to report on
two aspects: test harness implementation
and presence of unit tests.
Test harness implementation verifes
that if the requirements document prom-
ises a test harness, the design document
should cover it. If it does, then we can be
confdent to some extent that this execu-
tion and automation will happen.
However, designers may, in their design
process, discover ways in which some
of the test harness could be replaced, by
All relevant documents are reviewed (and amended if needed) and used by testing
Every requirement is testable
Every requirement is covered by a complete set of tests
Every test is automated in parallel with product development
Every test is executed for every build released
Figure 1: attributes of a test process to detect all possible defects
11 PT - August 2014 - prof essi onal tester. com
Testing in the lead
Figure 2: audit and notifcation system
Figure 3: test quality dashboard
requirements
review
test plan
review
design
review
test code
review
test build
Software lifecycle Software lifecycle
Databases Databases
Group/
team layer
Group/
team layer
Release/
project layer
Release/
project layer
Dashboard Dashboard
immediate notifications
persisting issue reports
audit data
Audit and notification system Audit and notification system
audit
management
system
audit
management
system
connection
manager
connection
manager
audit
engine
audit
engine
notification
subscription
notification
subscription
notification
generator
notification
generator
data
raw data
document
management
system
document
management
system
data
test
management
system
test
management
system
data
code
management
system
code
management
system
data
defect
management
system
defect
management
system
data
project
management
system
project
management
system
resolution reports
metrics
12 PT - August 2014 - prof essi onal tester. com
Testing in the lead
unit test cases which could be executed
by programmers.
In this case the design of those parts
can be replaced, in the design docu-
ment, with full specifcations of the
tests and commitment that they will
be automated.
Presence of unit tests is the extent to
which those unit tests have been speci-
fed in the design document. Importantly,
its measurement does not include
assessment of the completeness of the
unit tests. That is not within the scope
of a design review: the completeness
of tests for specifc requirements will be
reported by the test plan review.
The design document should specify
the tests at whatever level is necessary
to explain how they will deliver test-
ability. It aims to show that if a specifc
acceptance test fails, a specifc unit test
would necessarily fail: for example a test
for null pointers, memory leaks, array
indices out of bounds, stack overfow
etc. Execution of that unit test could be
done by any available method, including
static analysis, dynamic analysis and
load testing with utilization monitoring.
Metrics for test plans
Test traceability measures the extent to
which the requirements are covered by
reviewed test cases.
Completeness of reviewed test
cases measures the extent to which
they cover all possible implementation
errors that could be achieved using
standard test design techniques. It is
assumed that all testers are trained in
test design techniques and can verify
the correctness and completeness of
the presented test cases.
Metrics for test code
Testware maintainability measures
the immunity of testware to produc-
tion code changes. Ideally, a single
change in the production code, APIs,
or related interfaces should lead to a
single change in the testware.
Test code is a development
product and needs to follow the
same process and meet the same
standards as production code. It is
assumed and required that, within
that process, lines of test code
reviewed, defect density of test
code and inspection rate of test
code are measured.
Metrics for automated test coverage
Extent of test automation is
the percentage of the test cases
identifed in the test plan which
have been automated. The test
automation code should be
delivered at the same time as the
code it will verify. This is especially
important when code is released
frequently, e.g. in continuous
integration environments, since
the test code is needed for sanity
checking and regression testing,
as well as to assure correctness of
new functionality.
During automated test execution,
for each software delivery, the code
coverage metric can also be reported.
Although we noted above that it is a
poor indicator of test quality, code cov-
erage is a good indicator of test teams
productivity. Measuring it here gives a
useful comparison across different test
projects or groups. It is important that
the same coverage level is maintained
across all of them.
Implementation of audits
These and other test-related metrics
are held in databases such as require-
ments management, test management
and review management tools. They are
raw data, meaningful only when audited
against specifc guidelines which defne
their interpretation, typically using ranges.
An audit and notifcation system (fgure 2)
performs this task and reports to sub-
scribed personnel as needed. The reports
are of two kinds: notifcation immediately
an issue is detected, and compiled infor-
mation on persisting issues.
The output of the audit system is also
presented in a test quality dashboard
(fgure 3). Its purpose is to provide
easy comparison of test quality for dif-
ferent releases, technologies, features
and products.
Cost of implementation
Ensuring that the test process is always
formal, complete, straightforward and
maintainable in real time decreases the
cost of testing dramatically because it
prevents test degradation from causing
unnecessary rework, to both tests and
the product. When we prototyped the
process described above, manually, at
Alcatel-Lucent, with no change in how
test design or automation itself was
done, 70% code coverage was achieved.
The existing test automation processes,
with no quality feedback, achieved only
40% code coverage
Gregory Solovey PhD is a distinguished member of technical staff at Alcatel-
Lucent. He is currently leading the development effort of a test framework for
continuous integration

Anca Iorgulescu is a software developer and agile coach at Alcatel-Lucent.
She is responsible for the establishment and automation of quality processes
and audits for wireless products
13 PT - August 2014 - prof essi onal tester. com
Testing in the lead
Testing is about to face a massive
challenge. Two vital, yet divergent,
business demands depend upon it.
The frst is economic. In that, testing is the
victim of its own success. All reasonably
well run businesses now consider good
testing necessary. Until recently, some of
them considered it optional. We testers
have won our most important victory.
But now that we are an integral part of our
organization, we are required to contribute
to its effort to reduce lights on operating
costs. So, testing has to be done more
effciently, ie cheaper. Those who under-
stand testing and those who do not still
agree that this can be obtained by greater
centralization of testing resources, optimi-
zation of low-cost delivery teams and the
use of standard tools and methodologies.
All these act to reduce the amount of test-
ing effort required.
The second challenge is practical: the
digital agenda of the business. As other
contributors to this issue of PT infer, it
is the nature of software to change, and
so to change the world. In the world as
software currently has it, time to market
outprioritizes cost.
Any ambitious business must now
modernize and innovate areas such as
mobile applications, self-service websites,
social media analytics and multi-channel
e-commerce platforms just to remain com-
petitive. Put another way, the demands of
its customer are infuenced by its competi-
tors, all online. Brand competition is now
in real time.
But the digital strategy is usually led by
sales and marketing, who often succeed
in circumventing the CIO and undertaking
their own development projects, often lev-
eraging cloud-based tools and platforms.
Testers need to get their arms around
both of these dynamics. In this article I
will predict how test organizations will
need to adapt to the market dynamics
of digital and cloud computing and yet
remain effcient.
Digital transformation
Everyone in business, in all sectors, is
wrestling with this concept.
Digital technologies are revolutionizing
how companies, established and new,
interact with their customers, who are
shopping on their mobile devices, sharing
their views on products via social media
Get ready for testing
in the front line
by Nick Mayes
The latest forecast
from PTs weatherman
Nick Mayes
As software becomes more important,
so do we
Timely performance testing prevents technical debt
Agile development particularly needs early and frequent
performance testing. Performance should be tested at the
very rst iteration and SLAs assessed precisely at every sprint.
That way, developers know immediately that something in a
particular build has caused deterioration. Learning that several
builds later makes isolating a nightmare and debugging
exponentially more expensive.
Delivering actionable insight to developers quickly
Agile developers need to know more than just that their code
is causing performance issues: they need to know when their
code started causing problems and on what story they were
working when the issue started. Its a huge pain for developers
to be forced to go back and x code for a story they worked
on weeks or months ago. It also means they cant spend time
working on getting new features out the door. It is important to
address performance issues early in the cycle so that important
feedback can be delivered to the developers. This is crucial to
saving costs.
But many teams nd timely performance testing hard to
achieve in practice. Functional testing and debugging inevitably
takes priority and delays the start of performance testing,
creating need for additional, wasteful hardening iterations.
www.neotys.com www.neotys.com
Why and how performance testing should be done earlier
Agile Method
Waterfall Method
Sprint #1 Sprint #n ... Sprint #2
Discover Design Develop Test
Discover
D
e
s
i
g
n



T
e
s
t
Develop
Discover
D
e
s
i
g
n



T
e
s
t
Develop
Discover
D
e
s
i
g
n



T
e
s
t
Develop
Late performance testing causes late releases and late features
If performance testing is done near the end of a development cycle, there will be little time
to x defects detected. Just as with functional defects, that delays releases and/or causes
features that users need to be removed from them, or the decision to release defective
software with signicant risk of performance failure in production. Worse still, if the
performance defects are found to be fundamental, they may require painful architecture-
level changes taking a very long time.
How can you change this situation?
1. Put performance SLAs on the task board
Performance needs to receive continuous attention. User
stories that are written from the functional perspective
(as a user, I can click the view basket button and view
the my basket page) can be qualied with performance
information (as one of 1,000 logged-in, active users I can
click the view basket button and view the my basket
page less than one second later). Doing this to all user
stories could become cumbersome, but most instances
of it can be replaced with application-wide stories (as a
user every page I request loads in less than one second),
by adding the SLAs to a list of constraints (including
functional constraints) that must be tested for every story,
or by including specic SLAs in acceptance test criteria
so that a story cannot reach done until they are shown
to be achieved. This last approach works particularly well
when changes made for a story affect a relatively small
proportion of the codebase so performance defects
introduced will likely affect only a small section of
application functionality.
2. Anticipate changes to performance test
design
Provided testers stay engaged with the team and plan
ahead, testing can stay ahead of the curve. One of the
best things about agile for testers is that they learn about
updates to development tasks in meetings with the
developers themselves and can think immediately about
how they will test the stories currently being coded. This
thinking should include performance testing. Will new
scripts and/or congurations be needed, or can existing
ones be modied?
3. Get performance testing started faster
You cant performance test early if setting up to do it
takes too long. Choose a tool with time-saving features
out of the box that help you get results fast: wizards,
advanced pickers, automatic parameter handling etc.
4. Keep performance testing agile
You cant performance test frequently if test design,
maintenance and execution takes too long or causes
process bottlenecks. Choose a very intuitive, recording-
based tool to create scenarios 30-50% faster than by
scripting, with content updating tools that identify when
application changes affect test scenarios and re-record
only the parts necessary.
5. Report performance defects effectively and
efciently
Agile teams dont need to know that a test failed.
They need to know exactly what in the application or
infrastructure did not meet performance SLAs. Choose
a tool with powerful, detailed comparison of test results
and simple report generation to deliver immediately
actionable information.
6. Collaborate on performance testing
Testing workows are lumpy and performance testing is
no exception. Agile is about everyone being able to help,
when needed, with whatever is most urgent. Choose a
tool with advanced sharing of all assets including virtual
user proles, load proles, populations, monitoring and
results so that exible teams can work together, always
with and on the latest versions.
neotys.com/ptwebinar To register please visit:
Do you want to know how to get started with
the new generation of advanced, accurate, fast
performance testing?
Please see Henrik Rexeds article on page 4 of this issue of Professional
Tester, then attend our webinar Specifying accurate and testable
performance requirements. Our experts will explain how to get the
numbers nearly right rst time, so you can start performance testing
even faster.
Simply Powerful Load & Performance Testing
Neotys performance testing webinar, 7th October 2014
TIME
Discovery Design Development Testing Production
C
O
S
T

O
F

C
H
A
N
G
E
ADVERTORIAL A4 02.indd 1 2014-08-08 3:49 PM
Timely performance testing prevents technical debt
Agile development particularly needs early and frequent
performance testing. Performance should be tested at the
very rst iteration and SLAs assessed precisely at every sprint.
That way, developers know immediately that something in a
particular build has caused deterioration. Learning that several
builds later makes isolating a nightmare and debugging
exponentially more expensive.
Delivering actionable insight to developers quickly
Agile developers need to know more than just that their code
is causing performance issues: they need to know when their
code started causing problems and on what story they were
working when the issue started. Its a huge pain for developers
to be forced to go back and x code for a story they worked
on weeks or months ago. It also means they cant spend time
working on getting new features out the door. It is important to
address performance issues early in the cycle so that important
feedback can be delivered to the developers. This is crucial to
saving costs.
But many teams nd timely performance testing hard to
achieve in practice. Functional testing and debugging inevitably
takes priority and delays the start of performance testing,
creating need for additional, wasteful hardening iterations.
www.neotys.com www.neotys.com
Why and how performance testing should be done earlier
Agile Method
Waterfall Method
Sprint #1 Sprint #n ... Sprint #2
Discover Design Develop Test
Discover
D
e
s
i
g
n



T
e
s
t
Develop
Discover
D
e
s
i
g
n



T
e
s
t
Develop
Discover
D
e
s
i
g
n



T
e
s
t
Develop
Late performance testing causes late releases and late features
If performance testing is done near the end of a development cycle, there will be little time
to x defects detected. Just as with functional defects, that delays releases and/or causes
features that users need to be removed from them, or the decision to release defective
software with signicant risk of performance failure in production. Worse still, if the
performance defects are found to be fundamental, they may require painful architecture-
level changes taking a very long time.
How can you change this situation?
1. Put performance SLAs on the task board
Performance needs to receive continuous attention. User
stories that are written from the functional perspective
(as a user, I can click the view basket button and view
the my basket page) can be qualied with performance
information (as one of 1,000 logged-in, active users I can
click the view basket button and view the my basket
page less than one second later). Doing this to all user
stories could become cumbersome, but most instances
of it can be replaced with application-wide stories (as a
user every page I request loads in less than one second),
by adding the SLAs to a list of constraints (including
functional constraints) that must be tested for every story,
or by including specic SLAs in acceptance test criteria
so that a story cannot reach done until they are shown
to be achieved. This last approach works particularly well
when changes made for a story affect a relatively small
proportion of the codebase so performance defects
introduced will likely affect only a small section of
application functionality.
2. Anticipate changes to performance test
design
Provided testers stay engaged with the team and plan
ahead, testing can stay ahead of the curve. One of the
best things about agile for testers is that they learn about
updates to development tasks in meetings with the
developers themselves and can think immediately about
how they will test the stories currently being coded. This
thinking should include performance testing. Will new
scripts and/or congurations be needed, or can existing
ones be modied?
3. Get performance testing started faster
You cant performance test early if setting up to do it
takes too long. Choose a tool with time-saving features
out of the box that help you get results fast: wizards,
advanced pickers, automatic parameter handling etc.
4. Keep performance testing agile
You cant performance test frequently if test design,
maintenance and execution takes too long or causes
process bottlenecks. Choose a very intuitive, recording-
based tool to create scenarios 30-50% faster than by
scripting, with content updating tools that identify when
application changes affect test scenarios and re-record
only the parts necessary.
5. Report performance defects effectively and
efciently
Agile teams dont need to know that a test failed.
They need to know exactly what in the application or
infrastructure did not meet performance SLAs. Choose
a tool with powerful, detailed comparison of test results
and simple report generation to deliver immediately
actionable information.
6. Collaborate on performance testing
Testing workows are lumpy and performance testing is
no exception. Agile is about everyone being able to help,
when needed, with whatever is most urgent. Choose a
tool with advanced sharing of all assets including virtual
user proles, load proles, populations, monitoring and
results so that exible teams can work together, always
with and on the latest versions.
neotys.com/ptwebinar To register please visit:
Do you want to know how to get started with
the new generation of advanced, accurate, fast
performance testing?
Please see Henrik Rexeds article on page 4 of this issue of Professional
Tester, then attend our webinar Specifying accurate and testable
performance requirements. Our experts will explain how to get the
numbers nearly right rst time, so you can start performance testing
even faster.
Simply Powerful Load & Performance Testing
Neotys performance testing webinar, 7th October 2014
TIME
Discovery Design Development Testing Production
C
O
S
T

O
F

C
H
A
N
G
E
ADVERTORIAL A4 02.indd 1 2014-08-08 3:49 PM
16 PT - August 2014 - prof essi onal tester. com
Testing in the lead
and leaving data across the web that
enables analysis of their behavior: not
only their cash purchases, but the free
resources they access and exchange:
music, photos, videos and any other digital
content. If you are in the cloud, the cloud
knows you, and what you are likely to do
and buy.
As an example sector, retail is not the
most important, but is easy to understand.
Online-only retailer ASOS, 14 years old,
will probably make 1bn sales in 2014,
helped by its superb etail site. Marks and
Spencer took 90 years to reach 1bn
sales. It recently spent 150m on a new
website whose failure in production has
damaged the company fnancially to a
much greater tune.
At the same time, digitization brings about
continually, completely new business
models and value propositions driven
by the digitization of services and their
deployment from the cloud, connected
devices, and the ubiquity of the (mobile)
internet or social media functionalities.
This digitization disrupts entire value
chains, ecosystems and the competi-
tive landscape in all industries and will
continue to do so.
Companies in all industries in B2C and
B2B markets are being challenged to
adapt their business and technology
strategies as well as their operational
processes to these fundamental changes.
Many talk explicitly about undergoing
a digital transformation. It is critical
for survival for every business today to
understand what benefts and challenges
digitization provides to their industry in
general and their value proposition in spe-
cifc and formulate an overarching digital
vision for their company.
What does digital transformation mean
for test organizations?
1. A raft of projects focused on testing
systems to ensure the seamless fow
of data across all customer channels
and into the back end systems. For
example, many retailers are looking
to become omni-channel, whereby
customers are presented with a uni-
form experience with a retail brand,
whether they are engaging with them
in store, online, via a mobile device,
through a loyalty card scheme or
via a service offered by a business
partner (such as a click and collect
service at another store or location).
2. Increased focus on user experience
testing. The interface will become an
even more important battleground
as businesses compete to provide
the best and most innovative digital
experience. Testing teams will have
to work more closely with marketing
teams and digital agencies that the
business engages for design and
development services.
The biggest challenge will be that
the testing organization will fnd itself
pulled in two directions. In order to
address business risk brought on by
digital business models and interac-
tions with customers, application
development and testing processes
need to change. However, large enter-
prises formed application development
and operations teams primarily to
support internal facing business appli-
cations linked to back end systems,
ERP, and analytics platforms.
Externally facing applications, for
example ecommerce websites or
web portals, are usually developed
as separate projects (either internally
or by a provider). The organizational
structures, roles, culture and mindset
to work as an integrated team across
application development and opera-
tions teams is relatively new to most
enterprise IT organizations.
Many businesses are already seeing a
separation in how testing is performed
between the core business and digital
projects not just in terms of the pro-
cesses and cycle times, but also in the
tooling and platforms. Freely available
open source tools offer a cost effec-
tive way to support quick-fre digital
projects and may not lend themselves
to the more industrial toolsets used in
the core business operations.
The testing leadership team must decide
where to strike a balance between
control (demanding that all testing must
be run and managed by a centralized
organization) and fexibility (offering a
menu of recommended/approved tools,
processes and methodologies) to ensure
that business can move at the speed that
it needs to while ensuring the quality of
its applications.
Cloud 2.0
Many businesses aiming to embrace
digital transformation will try to do so
using cloud computing. See fgure 1.
The adoption of cloud is entering
into a second and more ambitious
phase. Most companies have already
deployed SaaS offerings around the
edges of their core operations in
areas such as salesforce automation
and workforce management, while
exploiting public cloud infrastructure-
as-a-service to support short-term
spikes in compute requirements.
But the next fve years will see busi-
nesses extend their use of both private
and public cloud environments to
support more critical workloads. Large
enterprises will start to move their ERP
platforms towards private cloud delivery,
while the maturing of cloud development
platforms will see more organizations
build their new applications for cloud
delivery from the outset.
PAC believes that cloud will pose three
new challenges for the testing organiza-
tion in 2015 and beyond.
Firstly, they will look to extend their use
of cloud-based testing environments in
order to keep up with the pace of new
development projects. Many testers have
used the likes of AWS or Google to spin
up platforms quickly to support short-
term projects, which have proved a much
more cost effective way than investing in
new servers internally.
Testing in the lead
However, they will need to take a much
closer look at the economic business
case for cloud-based testing platforms
as the workloads increase in both
volume and complexity. An increasing
amount of new development work is
being driven and overseen by busi-
ness line leaders, which makes it more
challenging for the leadership team to
understand the total amount and cost
of testing being performed. The testing
function needs to get a clear picture of
the current bills in order to make the
best judgment about how to make best
use of cloud going forward.
Other considerations will increasingly
come to the fore. While scalability is the
major advantage of using cloud-based
testing platforms, clients will have to
ensure that they can scale back as
easily and rapidly as they ramp up their
usage. As the business becomes more
willing to use cloud platforms to support
more critical workloads, security will
be an increasingly important topic.
The testing organization will need to
check their cloud providers security
policies and robustness, particularly if
applications that support customer or
corporate data are in play.
The second major challenge will be the
integration of new cloud-based software
into the existing on-premise systems and
other SaaS offerings. The share of SaaS
in total application spending will increase
Figure 1: cloud and SaaS spending. Source: PAC
Discount valid on full price bookings made between 15th August and 30th September 2014 only. Not to be used in conjunction with other offers.
TestExpo returns to London this October!
Join us to explore the theme of Dening the Digital Testing Strategy
20% discount for Professional Tester readers!
Just quote PTEXPO20 in the comments eld
on our online registration form.
Visit: testexpo.co.uk
18 PT - August 2014 - prof essi onal tester. com
Testing in the lead
substantially in the next three years
(fgure 1).
While this trend will remove some of the
testing departments traditional burden of
large on-premise systems implementa-
tions, the related integration work will be
an increasingly important part of its work-
load. This will cover testing the interaction
of the systems at a functional level, and
also will ensure that data sharing between
the applications runs smoothly.
The third big challenge will be to keep
track of the raft of new cloud-based tools
that will become available and to evaluate
their commercial propositions, which can
often be less attractive than frst per-
ceived. For example, a pay-as-you-use
deal sounds great particularly for those
businesses for whom testing is a highly
cyclical activity - but not so appealing if
you have to commit to pay a baseline
sum over a three-year minimum sign-up
period.
One of the main benefciaries of the pro-
liferation of cloud-based testing tools will
be small and medium-sized businesses
that previously did not have the budget to
invest in enterprise-class offerings. Larger
businesses will also be able to push their
incumbent tools and services suppliers
for more fexible delivery and pricing
models, but they need to pay particular
attention to the scalability of cloud-based
offerings, which may quickly hit a ceiling
in terms of their cost attractiveness.
Test tools aaS 2.0
The two dominant players in the tools
space are HP and IBM, positions that
they owe in part to some major acquisi-
tions in the past decade (Mercury and
Rational respectively). Both HP and
IBM view the SaaS model as a way to
open up their tools to smaller accounts,
and also to defend their market against
emerging open source and niche tools
vendors which clients are exploring as
they look for specifc functionality.
The challenge of selecting a testing tools
supplier is further complicated by the
ongoing growth of open source suppliers.
Businesses have been aggressive in their
adoption of open source tools to support
performance and load testing due to major
cost savings that can be made, but they
tend to be deployed to support specifc
projects or as a complement to on-prem-
ise tool suites.
Staying in the lead
So how can testing ensure that it stays
on top of the new demands of digital
and cloud?
For some, it will be a long, hard fght
to regain absolute control, while others
will refocus on central operations and
position themselves as an advisor to
those parts of the business driving digital
projects, offering guidance on best prac-
tice, tools and suppliers. This two-speed
approach is one that is favored by
businesses in sectors such as insurance
where the legacy challenges in their back
and middle offce are so great that they
have put in separate teams to ensure
that their customer-facing services keep
pace with the market.
Whichever path they follow, the key to
the success of the testing leadership
team will be stakeholder management.
While the involvement of business
lines in the testing process is noth-
ing new, it is certainly becoming more
important as more businesses adopt
agile development and testing models,
while the focus on digital transformation
means that the business has increas-
ingly demanding expectations on the
look-and-feel of new applications. The
lines of communication also need to be
opened to the chief information security
offcer (CISO), as the focus of cyber
attacks shifts from the network perim-
eter to the application layer.
All this is of course set against a back-
ground of a renewed focus on software
quality. Organizations have become
more dependent on the performance
of their applications, which means that
major failures have become front-
page news. This doesnt just apply to
outages suffered by online businesses
such as Twitter or iTunes, but also
to banks, stock exchanges, retailers
and government agencies who have
seen their senior executives forced to
face up to media barrages following
high-profle system failures in the last
12 months.
No matter how the testing function han-
dles the balancing act, quality will remain
hugely important
Nick Mayes is a research director at Pierre Audoin Consultants (http://pac-online.com).
This article is based on research from its latest report Software Testing in 2015
19 PT - August 2014 - prof essi onal tester. com
Testing in the lead
The technological innovations that
have changed human life so much in
the last few decades, and those that
will do so even more in the near future,
are in software. IT hardware is now seen
as a mundane underlying entity needed to
support software functionality. This sudden
and dramatic change of emphasis is as
important as the industrial revolution of the
last century.
But the industrial revolution happened
relatively slowly: with mistakes, but also
with time for good principles and practices
to be developed and refned. That is not
the case for software, and those of us
who care about how good software is and
how well it is produced cannot help but be
keenly aware of the fawed, too diverse,
often chaotic processes taking place and
the great dangers of that.
So there is a tendency to think about how
good manufacturing practices could be
applied to software development. That is a
good idea: PT is fond of pointing out that
adhering more closely to some of them,
in particular formality, would beneft many
software organizations.
The mistake many such thinkers make
and it is a bad one is to draw an analogy
between manufactured physical products
and software. There is no such analogy.
The two could not be less similar.
Software is not a car
A manufacturing process aims to achieve
an optimal balance between quality,
productivity and cost. It is sometimes
appropriate and legitimate to compromise
quality. For example if a plastic moulding
machine can produce, per minute, either
10 perfect toy soldiers, or 25 of which
on average 5 are faulty, no-one would
hesitate to turn up the speed.
The concept of tolerance exists for the
same reason. The goal is not to keep all
products as close as possible to the centre
of the tolerance range. It is to make prod-
ucts that are anywhere within the range as
quickly and cheaply as possible. All those
products are useful.
Trying to apply this logic to software devel-
opment will lead frst to numerous errors
causing waste and/or failure and then to
software that does not meet its require-
ments and is a lot worse than useless.
The term manufacturing is often used to
mean mass production, but software is
not mass produced, nor even produced
in small quantities as in, for example,
manufacture by handicraft. We do not aim
Zero tolerance
by Sakis Ladopoulos
Sakis Ladopoulos
proves by indisputable
logic that testing should
control the project
Software is as software does
20 PT - August 2014 - prof essi onal tester. com
Testing in the lead
to produce a program more than once.
Despite what some developers might say,
there is no such thing as inventory in a
software process. There is only unfn-
ished work.
A better analogy can be drawn with
fabrication: consider for example a metal
works commissioned to produce a unique
staircase for a certain building, to the
customers unique specifcations. If the
product does not meet all of those specif-
cations it cannot be ftted nor used and will
rightly be rejected.
Perfect is the enemy of good enough
This commonly-heard aphorism does not
mean that it is OK for delivered software
to meet only part of its specifcation. That
would be a contradiction in terms: its
specifcations defne, precisely, what is
good enough. Rather it means that it is a
dangerous mistake to continue to improve
any work product beyond the point where
it meets its specifcations. A design docu-
ment, once it can be shown to implement
everything in the requirements document
upon which it is based and to comply
with all prevailing standards, must not be
improved in ways not mandated by the
requirements. If improvement is desired,
the requirements must be improved frst,
and only then the design.
The same applies to all other work
products including the delivered soft-
ware. Before changing it, whatever is to
be changed must be traced back until
the correct product in which to make
the change is found: all the way back to
requirements if necessary. To do other-
wise is to create discrepancy, making
both the changed and unchanged items
unft for their purpose.
Software is quality
We have seen that software is not at all
like physical products. So what is it like?
Here lies the problem. Software is not like
anything else. It is by defnition highly vari-
able and totally impalpable. That is why
there are so many theories and there is so
little agreement on how best to produce it.
Be honest with yourself. However good
your software skills, you dont know much
about software. No-one does or can.
Software is more complex than chess,
bigger than the universe, better than life.
So we should look closely at what little we
do know and understand, to try to go back
to basics and identify that of which we
are sure. So: what actually is software?
What exactly is the main deliverable of a
software project? This obvious question
has been answered many times, but no
answer is or can be satisfactory: as Willis
and Danley put it, trying to answer it is like
trying to nail jello to a tree. So, it is still
asked often. I almost always ask it when I
interview job candidates.
My own answer is that software is
quality. The word cannot be used as an
adjective to describe software, nor as
a noun by which to call an attribute of
software. Quality is the subject of soft-
ware, its essence, the reason it exists.
Software is what it does. If it does what
it should, things happen that we want
to happen. Otherwise, it causes other
things, or nothing, to happen. Desirable
happenings are the true, tangible
deliverable: the software is used by
people as its buyer intended, to do the
things its buyer wanted them to do. The
consequences of that not happening
are more tangible still.
Development is testing
It can be argued that Newtons
second law of thermodynamics dic-
tates that it is impossible to write or
test code without making mistakes.
Imagine a good requirements
document for a complex user-facing
application, and a concerted effort to
deliver them within which very many
mistakes are made. The program
delivered does nothing other than
print Hello world to its current output
device. It meets all requirements! It is
just very defective. To make it behave
as it should, so that it makes desirable
things happen, we simply need to
repair its defects. To do that, we need
frst to identify them
Frequent PT contributor Sakis Ladopoulos is a test manager at Intrasoft International
(http://intrasoft-intl.com), testing trainer and conference speaker
21 PT - August 2014 - prof essi onal tester. com
Testing in the lead
A few years ago, there was some
debate about whether large enter-
prises, governments, etc. should
buy and become dependent upon
systems and services that included
free software components. The debate
was too late, because they already had
and were. Concerns about quality and
security, while valid, could not stand
in the light of experience which shows
that, while both have defects, so far, free
software has proven to be at least as
good as proprietary software at (i) not
having them; (ii) having them found; and
(iii) getting them fxed fast. Point (ii) is
also logically obvious: general availability
of the source code enables third parties
to use a whole array of powerful test
techniques not available without it.
So, the use of complete third-party
free software components is generally
benefcial, but it does create a prob-
lem. This problem is something that
should send a shiver down your spine
when you realise its implications: code
reuse, as every tester knows a rich
source of defects, even more so when
the reused code is someone elses.
Does the product you are testing
include code copied and pasted from
free software source, plumbed in, and
maybe hacked about a bit? What if it
is defective? What if some of those
defects are security vulnerabilities?
You may think that an update from the
original provider will fx it, but thats only
if you keep up with the updates. Either
way, that code is sticky. You need to
know, urgently, what it is, from where
it came, and if possible, from what
version. Ill discuss how that can be
done below, but frst let us look at the
implications of its presence.
What to do about suspect third-
party code
First, fnd the origin of the third-party
code and try to track its latest version.
If it is the same as your code, minus
the plumbing and hacking (and you
understand the plumbing and hacking
and are certain your developers did it),
Open but hidden
by Normand Glaude
Normand Glaude on
a testing responsibility
and technique you
may have missed
From where did the code
you are testing come?
22 PT - August 2014 - prof essi onal tester. com
Testing in the lead
chances are you are current and will not
learn much from the history of the code.
Document your fnding for repeatability.
If the code is different, whether or not
you understand why, research the
history of known defects in the product
from which the code came. The release
history may well inform easy tests for
the most critical defects. But remember
that testing can show only presence,
never absence, of defects: those tests
passing does not mean that the code
is OK.
So, you must apply also whatever
structural test techniques are, and
to the greatest extent, available to
you: use static and dynamic analysis
not to assess quality of the code, but
to create test cases, with expected
outcomes of which you are confdent,
and implement and execute those test
cases, using unit test frameworks,
debugging tools or instrumentation
code as necessary.
Finally, revise your functional test plan
in the light of the new information.
Identify the tests most likely to exercise
the suspect code. Trace them back to
the requirements they were designed
to assure. Now, using what you know
about known defects in the product
from which the code came, apply the
fundamental technique error guessing
to design additional tests to assure
those requirements.
Is your product legal?
The use of other peoples code comes
with conditions, as defned in the licence
of the product from which it came: many
different licenses exist, and by their
nature they often prove to be incompat-
ible with business objectives. The frst
thing to establish is what licences apply
and what conditions they require. For
example, the MIT licence is one of the
most permissive: to paraphrase, it says
you can do anything you like with this
code except hold anyone else respon-
sible for it, but most forms of it also
include the following statement:
The above copyright notice and this permis-
sion notice shall be included in all copies or
substantial portions of the Software.
Figure 1: Protecode reporting
Testing in the lead
If your product contains even a small
fragment of code from another product
released under such a licence, but does
not include the copyright notice, you are
in breach.
Most free software licences include
more conditions. For example, the
very widespread GPL licence uses
the popular method copyleft which
requires anything derived from anything
it covers to inherit the same licence. In
other words, if your product contains
any GPL code, your product is also GPL
and you must release its source code!
If you dont want to do that, you need
to replace the GPL code. Now before
anyone fnds out.
You may even discover that your prod-
uct contains proprietary code source
or executable which is not validly
licenced at all.
Some code is also affected by other
legal restrictions, eg export licences. For
example, if your product borrows encryp-
tion routines (one of the most frequently
reused functions) it may be illegal to
distribute it in certain countries.
Identifying reused code
So, how to fnd the reused code in your
codebase? Some instances, especially
more substantial ones, may be revealed
by design documentation and change
management information, or by talking to
programmers.
But, it is likely that some instances will not
have been recorded nor remembered. My
companys product Protecode (see fgure
1) detects these by scanning the product,
both source and complied code. In a simi-
lar way to how an antivirus program looks
for known code, Protecode compares the
target code with its vast database contain-
ing details of hundreds of millions of fles.
Importantly, the comparison is based on
structure, not text equivalence, so it can
fnd even code which has undergone a
great deal of change.
As well as telling you from what product
and version range your code came and
whether it has been modifed, Protecode
uses the U.S. Governments National
Vulnerability Database (see http://nvd.nist.
gov retrieved 12th August 2014 1400hrs
UTC) to alert you to known vulnerabilities
in those product versions, and reports
exactly what licences and other restric-
tions are in effect. Using it, testers can
detect defects and risks associated with
code reuse immediately they arise at any
time, including very early in the lifecycle
Normand Glaude is chief operating offcer at Protecode (http://protecode.com)
VISIT www.eurostarconferences.com OR
CONTACT info@eurostarconferences.com FOR MORE INFO
JOIN US AT EUROPE'S LARGEST
SOFTWARE TESTING
CONFERENCE & EXHIBITION
60+ Sessions Including 6 Keynotes 5 Full-Day Tutorials
6 Half-Day Tutorials 40 Track Sessions 3 Workshops
BOOK BEFORE SEPTEMBER 26TH TO
AVAIL OF THE EARLY BIRD DISCOUNT
24 PT - August 2014 - prof essi onal tester. com
Testing in the lead
The best test data is production
data. It contains all known variations.
It is consistent with the production
database and external data ser-
vices. Its volume is perfectly correct
for relevant, cost-effective volume
testing and because it contains
accurate chronological data, also for
relevant, cost-effective stress testing.
Collecting and using it is very easy.
So, many test organizations use produc-
tion data, including private personal data,
for testing and ignore the many dangers.
This situation has now continued for two
decades. Everyone sane knows it must
stop. Testing adequately without break-
ing the law is inconvenient, diffcult and
expensive. But testing that depends on
breaking the law wont become more
widely adopted, used and integral to all
work involving software, as testing should.
The letter of data protection law varies
and is open to interpretation, but there
is no need to consider the variations or
interpretations because using these in any
way would be disingenuous. The spirit of
the law is perfectly clear and its purpose
obvious: we are obliged to do all we can to
prevent any private information relating to
any individual becoming known to anyone
who does not need to know it for essential,
legitimate, operational business reasons.
Testing is not operational.
So what does all we can mean? Does
it give us the freedom to allow a tester
(or a developer or anyone else) who is
legally bound, and trusted, not to reveal
it to anyone else nor to misuse it, to use
private information? No, because putting
it in a development or test environment
with software that is still being tested risks
breach. The only production data that can
be used legally for testing is that which will
not reveal private information.
Please note the important difference
between will not and cannot. Most
approaches to making production data
safe to use for testing could be defeated
by someone prepared to invest great
effort in doing so. As in all data security
work, the likelihood of that (which is, usu-
ally, a function of the gain possible from
misuse of the information obtained) must
be taken into account when deciding
Forget me not
by Edwin van Vliet
The second part of
Edwin van Vliets
series on test
data management
Testing needs to come out of denial
about data protection
Testing in the lead
what prevention measures to take. The
same thinking applies to information
which does not come under data protec-
tion law but is sensitive for business
reasons and thus could be used illegiti-
mately: for example an organizations
internal fnancial information.
Understanding how the available meas-
ures work, and therefore their inherent
risk, should lead to improvement: better
testing with less risk of breach.
Scrambling
Replace every alpha character in the data
with x and every numeric character with
0. Replacing capital letters with lower
case x may cause false incidents due to
input validation code. If so, replace upper
case alpha characters with X.
For example, replace J.Jansen@
provider12.com with X.Xxxxxx@
xxxxxxxx00.xxx.
An email address is an excellent
example of a data item suitable to be
anonymized by this simple and very
effective method. It is unlikely to be
parsed therefore unlikely to cause
false test incidents.
Note that if the genuine data contains
many instances of the characters X, x
and 0, it may be changed very little by
application of the method.
Shuffing
Data in one or more felds is moved
between records. To most intents and
purposes the data remains similar to
production data, but now no record is
that of a real person.
Imagine a real database of people
with addresses. Sort the data in the
feld town/city pseudorandomly.
Has the data been anonymized?
Of course not. It is trivial to write
a program that uses web services
to derive the town/city from other
address felds.
So shuffing must be applied to more
felds. But that will make them invalid,
that is they are no longer addresses
that really exist, and that may cause
false test incidents. Depending on
the functionality under test, and how
it is being tested, it is often possible
to avoid this by limiting the degree
of change yet still achieve suffcient
anonymization. Designing (based on
the test item specifcation and design)
and implementing an algorithm to do
this may require signifcant effort and
sophisticated tools.
Blurring
Purely numeric felds are adjusted
numerically. For example the year in
the date of birth of a real person may
be randomized between the real year
Software Testing Specialists
www.etesting.com
info@etesting.com
Call Us
+44 (0) 208 905 2761
e-testing deliver confdence in IT projects
across all industries and verticals. We are
mindful of the way you work and have the
agility to meet the specifc requirements
of your organisation. In an industry of ever
changing needs, we provide specialist
testing solutions that improve effciency
and mitigate project risk.
Delivering better
quality software for
15 years.
Consultancy Outsourcing
Resourcing Training
Mobile Testers
We are looking for
eople ower assion
Europe's first mobile systems integrator is looking for
experienced mobile testers and mobile security consultants
for a variety of customers in Belgium, France, United
Kingdom, Germany and the Netherlands. Please send
your resume to talent@gpxs.net.
Amsterdam Brussels Ghent Paris London Munich
26 PT - August 2014 - prof essi onal tester. com
Testing in the lead
plus or minus 10, the month between
1 and 12, and the day between 1 and
28/29/30/31 according to the random
month and year: in other words, the
persons date of birth is replaced with
a random, but feasible, date of birth.
Whether this will cause false test inci-
dents again depends on the test. It may
be possible to prevent it by applying fur-
ther constraints based on product or test
specifcation. For example, the blurring
algorithm can check to see whether the
blur causes the known input equivalence
partition in which the date lies to change:
for example if a person aged 18 appears,
after blurring, to be 17. If such a change
is detected, the blur is redone, until a
value that causes no change of partition
is obtained.
Again depending on whether they will
cause false positives or reduce test
coverage, easier but still effective
methods are sometimes available. A good
example is simply to set all values of in
the day feld to 1. In her paper Simple
Demographics Often Identify People
Uniquely (Carnegie Mellon University,
2000: see http://dataprivacylab.org/pro-
jects/identifability/paper1.pdf retrieved
11th August 2014 1500hrs UTC), Latanya
Sweeney shows that 87% of US citizens
can likely be identifed from their gender,
zip code and date of birth, but taking
away the day of the month reduces this to
3.7%. Obviously the full records of people
who actually were born on the frst of the
month should be removed.
Replacing
Some data felds are designed to be
unique: for example what is called in the
UK National Insurance Number and in
the Netherlands Burgerservicenummer.
This number associated with any other
information during testing allows that
information to be associated with the
individual to which it pertains.
For many tests, felds like this can be
shuffed, but this is not suffcient to
protect identity. Because the feld data
is unique, someone who knows what
it is for a specifc person and fnds it
in a given database knows that more
information about that person is also
in the database. If they have any other
information about the person, eg name,
it is a simple matter to make the search
very narrow.
So, the only effective approach is to
scramble/and or blur parts of the feld
data itself. But for many tests, it must be
kept valid according to what is consid-
ered valid by that test and anything it
invokes, or false incidents will result. For
example, a Dutch BSN is nine digits. It is
unique, so it must be replaced. But it is
subject to restrictions defning whether
it is or is not valid: a weighted 11 proof
which ensures that at least 2 digits are
different between any two valid BSNs.
This creation algorithm, because it is
known, enables a valid fctive value to
be created to replace the real value.
Chain depersonalization
Usually, anonymization must be applied
consistently to avoid loss of data integrity
and false positives. If similar or related
data exists in multiple tables and data-
bases, it needs to be anonymized in
the same way in all of them. Many tests
will depend on structure: for example
suppose we shuffe surnames but the
test item deals with family relations. We
must use the same shuffing key values
on everyone in the family, or more likely
everyone who shares a surname, or even
a similar surname.
In order to be able to undo or unravel
part of the anonymization where
necessary, many test data tools use
translation tables which record what
key values have been applied to what
records. Retaining these translation
tables after the anonymization process
is clearly dangerous: they must be kept
highly secure, for example by allowing
access only via the anonymization tool.
The translation table can also be reused
to anonymize further data for refresh
purposes. Some organizations pass
anonymized new production data to test
environments frequently. To prevent
degradation of test effectiveness and for
security the translation table is refreshed
too, but perhaps less frequently. This
saves work because the new data
is anonymized, in chosen respects,
in exactly the same way as the old,
enabling existing test cases and their
associated test data to be kept valid
without requiring maintenance
Edwin van Vliet is a senior consultant at Suprida

The frst article in this series appeared in the July 2012 issue of Professional Tester
(http://professionaltester.com/magazine/backissue/PT015/ProfessionalTester-July2012-
vanVliet.pdf). Next: test data strategy
20% discount for readers of Professional Tester.
Just visit testexpo.co.uk and quote PTEXPO20 in the
comments eld on our online registration form.
After a hugely successful event last year, TestExpo returns to London this October.
The theme of Dening the Digital Testing Strategy aims to address the ways we might
rethink our testing practices and incorporate new technologies to support the delivery of high
quality applications.
Book your place today and join us for thought-provoking presentations, best practice tips, lively
discussion sessions and real life case studies from Experian and BBC Future Media.
Discount valid on full price bookings made between 15th August and 30th September 2014 only.
Not to be used in conjunction with other offers.
Performance Testing For Web and
Mobile Applications
www.neotys.com
Fast & Automated
Unmatched Test Realism
Lower Total Cost
of Ownership
NeoLoad is a load testing software solution
designed for web and mobile applications
to realistically simulate user activity and
monitor infrastructure behavior.
Around the world, companies across all industries
rely on NeoLoad for its unbeaten value, TCO, and ROI.
Neotys dedicated support is another key factor in its
customers satisfaction.
NeoLoad provides you with a complete range of exclusive
features to make designing your performance tests
faster and easier. This means you and your teams even
non-specialists can test earlier in the deployment life cycle
and increase testing frequency before the Go Live date,
improving your ability to reduce risks and prevent poor web
and mobile application performance.
neotys.com/ptwebinar
To register please visit:
Join the webinar on Tuesday 7th October
2pm UK time
Specifying accurate and testable performance requirements
Presented by In partnership with
To help you in your approach of load and performance testing we invite you to attend our live webinar
www.neotys.com
For more information, visit:
Neotys Prof Tester A4 Back Cover Ad 05.indd 1 2014-08-11 1:20 PM

You might also like