You are on page 1of 25

Specification by Example

Notes:
In this unit we look at:
Difficulties in communicating with non-technical stake holders
Need for executable specifications
Some tools and techniques you can use

Copyright

Page 4

The Importance of Validation

Notes:
It doesn't matter how good your software is...
how correct / elegant / usable / fast / robust / scalable ...
(or not!)
... if nobody wants it.
If it doesn't do what the customer wants you've wasted your time and their money.
But:
People cannot communicate effectively, due in part to well studied cognitive biases such as
the Illustion of Transparency.
People don't know what they want until they've used something that's not it.
The context in which the software will be used is continually changing during development.
Using the software changes what they want because it changes the way they work and gives
them new information and so new priorities.
Worst thing to do is to go away for six months and hand over one big delivery at the end. It will be
wrong!

Page 5

Copyright

Non-Technical Stakeholders

Notes:
Not everyone in the organisation is a programmer. You have to work with customers,
entrepreneurs, managers, sales people, marketing, end users, etc.
They have goals that the system must help them achieve, but don't care about the technical
details.
You know about technical details but not what the system must achieve.
These people think about problems very differently from programmers. Or rather, programmers
think very differently from normal people. The more ingrained you become in a technical field, the
more different your thinking becomes.
In particular, programmers value abstraction, but non-programmers often do not. Discussing
abstractions with your customer can lead to misunderstandings and loss of trust.

Copyright

Page 6

Bridging the Gap

Notes:
Any project must bridge the gap between technical and non-technical stakeholders:
non-technical stakeholders need to tell the developers what they want the system to achieve
the developers need to understand what the non-technical stakeholders want
the developers need to be able to demonstrate that the system achieves what has been
asked for...
...and if it doesn't, easily understand where the misunderstandings are and what the system
should do instead.
We'll look at some tools and techniques to help this communication.

Page 7

Copyright

Ubiquitous Language

Notes:
Experts in a domain use a shared (natural) language: technical terminology, jargon, acronyms.
Different organisations have their own dialects of these domain-specific languages.
Communication is facilitated if everyone involved in the project uses the same language, and the
same language is used throughout the system:

documentation.
user interface.
code.
database.
network protocols.

It takes effort to maintain the ubiquitous language throughout a codebase. Modern refactoring tools
make it much easier.
Beware: different parts of the same organisation might give the same words completely different
meanings. This can greatly complicate projects that integrate parts of the organisation that have
not directly worked together before. It can also bring to light gaps in the business processes
caused by people in different departments misunderstanding one another.

Copyright

Page 8

A Big Specification Document Signed Off In


Triplicate

Notes:
A common reaction to the difficulty of communicating between stakeholders and developers, but
often causes more communication problems than it solves.
Takes a long time to write. Programmers don't have time to read it thoroughly.
People know it won't be changed, so they try to get every tiny thing they can think of into the
document, no matter how trivial or unnecessary.
Is usually wrong or inconsistent, but it's too big for anyone to know where.
So vague that it cannot be implemented, or so detailed that it overly constrains the implementation.
Features are not strictly prioritised, so it's impossible to know what has to be implemented and
what is just a "nice to have". Everything is top-priority.
Requires lengthy manual testing to validate that the system does what was originally documented.
Must be translated into a test plan that contains the same information presented in a different way.
End result: hinders learning and continual improvement.

Page 9

Copyright

Design by GUI Mock Ups

Notes:
Users find it easy to understand a system in terms of its user interface, but it is difficult to pin down
precise business rules in terms of the UI.
Discussing the user-interface too early can sidetrack requirements gathering into inconsequential
details: E.g. "What colour should the title bars be?" "Can you move that icon two pixels to the left?"
Prototype code looks just like a real running program, so users want it delivered right now, and
developers are tempted to build the real system upon the poor quality, untested prototype code.
The interface design created early can be used to guide an iterative development process. It
should not be implemented up front and then behaviour filled in behind the scenes.

Copyright

Page 10

Paper Prototyping

Notes:
It is useful to think about interaction design early, but it's often best to discuss that design in terms
of lo-fi mock-ups, such as paper prototypes (as shown).
However, interface designs cannot capture the underlying rules that must govern the system's
behaviour. (Google search is an extreme example).
It is easy to design the final state of the GUI on paper. You need to consider how the GUI will
evolve towards that final as the system grows, remaining usable/convenient/pleasant along the
way.

Page 11

Copyright

Automatic Validation of Requirements

Notes:
We need to describe the required behaviour in a form that can both developers and non-technical
stake holders can understand but that can be validated automatically as part of the system's
automated test suite.
This means you can't use:
Plain English: cannot be automatically validated against a real system
UML: cannot be understood by most non-developers, cannot be automatically validated
against a real system.
Formal methods: cannot be understood by most people (developers and non-developers),
cannot be automatically validated against a real system
What works well is to describe system behaviour in terms of concrete examples, and use the data
in those examples to drive the real system and check the results it produces. E.g. tests, but written
in a way that non-programmers can understand.
Two ways to achieve this:
Make code look like natural language
Make natural language act like code

Copyright

Page 12

Domain-Specific Language

Notes:
A specification language designed to express requirements in a way that can be understood by
non-technical stakeholders and also executed as a test by the machine. The specifications can
(usually) only be written by a business person working in collaboration with a developer. But, this is
not a bad thing.
A drawback of a DSL is the lack of tool support. Modern IDEs allow programmers to make
sweeping changes to their code, confident that they have not broken anything. These changes are
not propagated to code in a custom DSL, so seemingly innocent changes can break many tests.
The cost of maintaining the tests can deter programmers from refactoring their code, leading to a
breakdown in code quality and reduced productivity.
It can take a significant effort to write Domain Specific Language (DSL) from scratch and the
DSL implementation itself then becomes a maintenance burden. This burden can be reduced
by building the DSL upon a generic framework. The example shown is an example test for the
Cucumber framework. Cucumber provides the skeleton of a DSL that is easy to extend with
application-specific statements.
A new class of Language Workbench tools will hopefully address both these drawbacks, letting
programmers build DSLs that inherit powerful tool support from the workbench. However, these
tools are at a very early stage of development. The most complete is JetBrains' MPS.

Page 13

Copyright

Domain-Specific Embedded Language

Notes:
Use a flexible, high-level language, or (ab)use the system implementation language, to write the
code that is easily readable by non-technical people.
A flexible, high-level language, such as Lisp, Smalltalk or Tcl.
Advantages: Flexible syntax can be easily molded into a language that precisely supports
specifying your application.
Disadvantages: Cannot share definitions between the system and the DSL (unless the
system is written in the same high-level language or the high-level language runs on the
same VM). This makes refactoring is difficult. Have to write in two languages, use two sets of
tools, etc.
Make low-level code look like a high-level language (as shown).
Advantages: Can share definitions between system and tests, to aid refactoring. Can use
the same tool set. Mainstream languages (Java, C#) usually have much better tools than
scripting languages.
Disadvantages: Can be a lot of effort to work around the verbose, noisy syntax of
mainstream languages. Resulting style violates coding conventions and can be a bit of a
shock to developers who are not used to the technique.
In both cases, the code can (usually) only be understood by a business person when they sit with
a developer. But, this is not a bad thing.

Copyright

Page 14

Executable Specification Documents

Notes:
Requirements are captured as documents written in a stylised form from which test scenarios can
be parsed and executed against the system.
For example, FIT extracts test data from tables in HTML (shown above), Word or Excel
documents. Test results are reported by annotating the documents to show which scenarios
work, and which do not, and why. The results act as "living documentation": a reliable, up-to-date
description of the current behaviour of the system. This is immensely valuable: most long-lived
systems have out-of-date documentation that does not help programmers understand its current
behaviour when making changes.
The FITnesse tool gives even more control to users: they can write FIT tests in a Wiki and run
them against the system immediately.
The method by which the parsed test scenarios are executed against the system can create a
high maintenance overhead. For example, FIT maps phrases in the document to the names of
Java classes, variables and functions that it invokes by reflection. This makes it difficult to change
the documents, because changing the wording of a document makes FIT try to invoke different
(nonexistent) Java features, and so the tests stop working. These problems are being addressed
by a new generation of commercial tools, such as Twist (http://www.thoughtworks-studios.com/
agile-test-automation).
The stylised form means that documents must be written by both a business person and developer
working together. But, this is not a bad thing.

Page 15

Copyright

Written Collaboratively

Notes:
Customers work with business analysts, developers, testers (QA) to define acceptance criteria.
The acceptance tests are way of enabling collaboration and communication. The tests provide a
common artifact that both parties can talk about. Depending on the organisation, analysts may not
be expected to edit these executable specifications, but they should certainly be able to read and
talk through them.
It makes a real difference to development if customers/analysts/QA sit with, or are part of, the
development team, so these conversations can go on whenever necessary, without having to
arrange meetings.
There is a trade-off involved in using specification by example tools. They improve communication
between developers and other stakeholders, and so help avoid misunderstanding and wasted
effort. But they increase development effort. If the specifcations are not written with and read by
non-technical stakeholders, the programmers might as well just write tests in the same language
as the system, which usually has good IDE support for refactoring.
For this reason, specification by example tools have had a greater take-up on Ruby projects than
Java projects. Ruby does not have very good language-aware editors and so the specification
documents are not much harder to manage than the Ruby code itself.

Copyright

Page 16

"Non-Functional" Requirements

Notes:
These acceptance testing techniques are most useful for capturing what software developers
term "functional requirements". Only software developers think that there are "non-functional"
requirements. Customers just have goals they want to achieve. Categorising their needs as
"functional" and "non-functional" is not helpful when communicating with them.
Many non-functional requirements can be test-driven. In fact, in most software development
organisations it is more common to apply a test-driven approach to non-functional requirements
than to functional ones.
For example, performance optimisation work starts by creating a performance test environment
that mirrors production and measuring system performance. The system is profiled and its
hotspots optimised. After each optimisation attempt, the performance tests are run again and the
performance improvement (or unexpected degradation) measured.
A good talk about how Google do performance testing can be seen at http://tinyurl.com/t425x

Page 17

Copyright

Not Everything Can Be Tested Automatically

Notes:
Not everything can be tested automatically:

Aesthetics
Usability
Fun (e.g. computer games).
Penetration testing
Things you haven't thought of (bugs, missing requirements)

Some things are not worth testing automatically because it is too expensive to automate the tests
and errors are instantly obvious by "eyeballing" output. For example, that GUI controls do not
overlap when laid out on screen.
Manual testing is always necessary but it becomes exploratory rather than scripted, supported by
tools not replaced by them. Testers try to find unanticipated scenarios that make the software fail.

Copyright

Page 18

Is Manual Testing Always Necessary Before


Release?

Notes:
What if we instrumented our application and gather useful measurements of user behaviour?
We could then test whether a change actually improved or detracted from our application's user
experience and profitability.
But usually:
Too few users to take statistically meaningful measurements
Users of shrink-wrap software do not want their installed applications "phoning home" and
sending unknown information from their computer to the vendor.
Cannot isolate the effect of a single change because releases happen too infrequently. Each
release contains multiple fixes and improvements.
The system does not let us isolate the user base into experimental subjects and control
group.
But the game changes at the web scale. Many thousands or even millions of people use an
application, so it can make statistically meaningful measurements. The application runs on our
servers so we can analyse our users' activity without causing concern about leaking unrelated data
from our users' personal computers. We can release frequently, without the overhead of physical
distribution.

Page 19

Copyright

A/B Testing

Notes:
This technique, known as A/B testing, is widely used by web application developers.
Need to engineer the system to support A/B Testing
Feature switches: control which users see which versions of which features
Gather statistics that relate features to desired outcomes. E.g. revenue earned.
High code quality, so that experiments about new features or even applications can be
performed cheaply without fear of catastrophic failure.
Automatic testing and deployment, so new features can be deployed rapidly and reliably (aka
"Continuous Deployment").
This approach is applied by so-called "lean start-ups", which use web scale, A/B Testing and
agile development practices to iteratively develop not just the company's software but it's entire
business model. We will examine the lean start-up concept in more detail later in the course.
For now, back to techniques and tools that work for systems with smaller user bases.
Just because you have a few users does not mean the system is not worth much. Often much
more critical than web-scale applications. E.g. reddit.com works at a huge scale but doesn't do
anything critical; a portfolio management system may have a handful of users but control the
pensions of millions of people.

Copyright

Page 20

FIT In More depth

Notes:
FIT parses tables of data from requirements documents and passes that data to "fixtures" that
interpret the data as a test. Each fixture interprets a single parsed table and:

extracts inputs and expected outputs from the table cells


invokes the system, passing it the inputs
compares the actual system outputs to the expected outputs
reports which cells were correct or incorrect.

FIT annotates the document with the test results and saves it as a test report.
FIT handles the parsing of test data from documents and reporting of results. The programmer
only has to write fixtures to connect the documents to their system. This is much easier if the
system follows the Ports and Adapters architecture.

Page 21

Copyright

Ports & Adapters Architecture

Notes:
A common large-scale structure for object-oriented systems is the Ports and Adapters architecture,
which was first described by Alastair Cockburn.
The Ports and Adapters architecture isolates the definition of the application domain model from
the program's technical infrastructure, such as databases, message queues and user interfaces.
The application domain model includes interfaces that define its relationships with the outside
world in terms of application domain concepts (what Cockburn calls these interfaces "ports").
These interfaces are implemented by objects that translate the application domain concepts onto
an appropriate technical implementation (Cockburn calls these objects "adapters").
For example, suppose an application domain model defines an OrderBook interface that
represents a collection of Orders. An adapter package might map the concept of an OrderBook to
tables in a relational database by using an object-relational mapper like Hibernate.
By separating concerns, the Ports & Adapters architecture ensures that software is flexible and
easier to maintain. Technical concerns do not leak into the domain model, so domain model
classes can be used in different contexts. The application domain model is easier to read and
understand, because it does not mix different concerns: the programmer only needs to work at a
single level of abstraction to understand the code. The application can easily be ported to run on
different technical infrastructure by replacing one adapter implementation with another.

Copyright

Page 22

Acceptance Testing The Application Domain


Model

Notes:
Acceptance tests that exercise the system end-to-end, interacting with it through its published
interfaces, give confidence that the entire system works. However, they are difficult to write
because systems usually involve distributed and concurrent behaviour. They are usually too slow
when there are many tests.
The Ports & Adapters Architecture makes it possible to run acceptance-tests directly against the
application domain model because the domain model is cleanly decoupled from the technical
infrastructure that connects it to the outside world. Acceptance tests can interact with the domain
model through its port interfaces and fake outgoing interfaces.
Acceptance tests written against an isolated domain model can run extremely fast. Because there
is no persistent state involved, in databases or message queues for example, it is easy to isolate
tests from one another.
Some end-to-end tests of the entire system are still needed to test the deployment and the
bootstrapping of the system.
In a poorly structured system, the domain model is tightly intertwined with the system's technical
concerns and so it is impossible to instantiate the domain model in a test environment and
exercise it in isolation from the rest of the system. In that case, slow end-to-end testing is the only
option until the code can be refactored.

Page 23

Copyright

A FIT Document

Notes:
A FIT document contains plain text interspersed with tables of test data. FIT ignores the plain text.
The first row of each table contains a single cell that identifies the fixture to interpret the table.
The name of the fixture class is created by capitalising the words in the first cell and removing
whitespace. For example, the table that starts with "Given the following products" is interpreted by
the fixture "GivenTheFollowingProducts".
What the remaining rows mean depends on the fixture. FIT defines three basic fixture base
classes that interpret a parsed table in different ways and interface with Java code by reflection.
Fixtures can be written very easily by extending these classes and defining methods and fields in
the subclasses.
Alternatively, you can extend the root Fixture class. In that case you have free reign as to how the
table is interpreted but must write your own code to map table cells to the system and report test
results.

Copyright

Page 24

Column Fixture

public class TheCust om erPurchases ext ends Colum nFixt ure {


public St ring Product ;
@Override
public void execut e() {
Product product = Syst em UnderTest .product Range.product Nam ed(Product );
Syst em UnderTest .t ill.barcodeScanned(product .barcode());
}
public St ring Displayed() {
ret urn Syst em UnderTest .display.displayedText ();
}
}

Notes:
A ColumnFixture is used to pass test data to the system under test and compare actual and
expected system outputs. It is useful if the same test needs to be repeated for different inputs. It
can also be used to set up data for the rest of the test scenario.
The ColumnFixture class must be subclassed. For each row in the table a ColumnFixture subclass
parses and stores input data from the row into its fields, executes some action using that input
data as parameters, and then compares output data returned from its methods to expected values
in the row.
If a column header is unadorned, the column contains input data. The ColumnFixture translates
the column header to the name of the field, parses the cell contents into a value of the type of the
field, and stores the value in the field by reflection.
When it has stored all the input data the ColumnFixture calls its execute() method. Subclasses
must implement this to pass the input data to the system in some way.
If a column header ends with "()" (e.g. looks like a function call), the column contains expected
output data. The Column fixture translates the column header to the name of a method, calls that
method on itself by reflection, and compares the result with the expected data.

Page 25

Copyright

Row Fixture

public class TheReceipt Shows


ext ends RowFixt ure
{
public st at ic class Row {
public int Line;
public St ring Text ;
public Row(int line, St ring t ext ) {
Line = line;
Text = t ext ;
}
}
@Override
public Class< ?> get Target Class() {
ret urn Row.class;
}
@Override
public Object [ ] query() t hrows Except ion {
Syst em UnderTest .t ill.paym ent Accept ed();
List < Row> rows = new ArrayList < Row> ();
for (St ring line : Syst em UnderTest .print er.out put ().split (" \n" )) {
rows.add(new Row(rows.size() + 1, line));
}
ret urn rows.t oArray();
}
}

Notes:
A RowFixture compares rows in the test data to objects obtained from the system under test.
Properties of the objects are compared to those in the table. The fixture reports if there are surplus
or missing objects.
RowFixture must be subclassed. The subclass defines the query() method to obtain a list of
objects from the system, and the base class implementation uses reflection to interrogate those
objects and compare their properties to the expected values in the table.
Unlike the ColumnFixture, which reflects upon itself, the RowFixture reflects upon other objects.
This means it can be used to test the system's actual domain objects. This helps ensure that the
system code uses the same terminology as the documentation.

Copyright

Page 26

Action Fixture

Notes:
An ActionFixture interprets rows as a sequence of commands to be performed in order. It is
typically used to represent a user's workflow in terms of actions performed that they would make
through the application's user interface.
The ActionFixture class defines four kinds of action, but subclasses can define more:

start: directs input to an object that represents part of the user interface.
enter: as if entering data into a field
click: as if clicking a button
check: as if reading data from the user interface, tests that the data has the expected value.

Too many workflow style tests make the behaviour of the system difficult to understand because
the rules by which it is governed are scattered among many workflow examples. Tests written with
RowFixtures and ColumnFixtures can describe the system's business rules concisely in a single
document.
It is easy to start by writing workflow style tests. As the system develops, workflow tests can be
refactored into concise, declarative descriptions of the underlying rules governing the system
behaviour.

Page 27

Copyright

Exercise

Ove r t o

YOU.
Notes:
We will now use FIT in a hands-on practical, described in the exercise sheet.

Copyright

Page 28

You might also like