A Survey of Software Testing Tools For Computational Science

RAL-TR-2007-010
A Survey of Software Testing Tools for

Computational Science
L.S. Chin, D.J. Worth, and C. Greenough
June 29, 2007
Abstract
This report presents a summary of information gathered in considering software testing practices for Computational Science and Engineering. It includes an overview of software testing,
and provides a survey of tools currently available to assist in implementing testing solutions
for scientific applications written in Fortran.
Keywords: software testing, software quality, verification, validation, Fortran
Email: L.S.Chin@rl.ac.uk, D.J.Worth@rl.ac.uk, or C.Greenough@rl.ac.uk

Reports can be obtained from www.softeng.cse.clrc.ac.uk
Software Engineering Group

Computational Science & Engineering Department
Rutherford Appleton Laboratory
Harwell Science and Innovation Campus
Didcot
Oxfordshire OX11 0QX
Science and Technology Facilites Council
Enquires about the copyright, reproduction and requests for additional copies of this report should be
address to:
Library and Information Services
STFC Rutherford Appleton Laboratory
Harwell Science and Innovation Campus
Didcot
Oxfordshire OX11 0QX
Tel: +44 (0)1235 445384
Fax: +44 (0)1235 446403
Email:library@rl.ac.uk
STFC e-reports are available online at: http://epubs.cclrc.ac.uk
Neither the Council nor the Laboratory accept any responsibility for loss or damage arising from the use of
information contained in any of their reports or in any communication about their tests or investigations
Contents
1 Introduction
2 Software Engineering Support Programme
3 Overview of Software Testing

3.1 Stages of Software Testing . . . .
3.1.1 Design phase . . . . . . .
3.1.2 Testing phase . . . . . . .
3.1.3 Maintenance phase . . . .
3.1.4 Implementation phase . .
3.2 Test Design . . . . . . . . . . . .
3.2.1 The Black Box approach .
3.2.2 The White Box approach
3.3 Deciding on a strategy . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
3
4
5
5
5
5
6
7
4 Available Tools
4.1 Testing Framework . . . . . . . . . . . . .
4.1.1 pFUnit . . . . . . . . . . . . . . .
4.1.2 fUnit . . . . . . . . . . . . . . . . .
4.1.3 DejaGNU . . . . . . . . . . . . . .
4.1.4 QMTest . . . . . . . . . . . . . . .
4.1.5 Cleanscape Grayboxx . . . . . . .
4.2 Capture and Playback . . . . . . . . . . .
4.2.1 AutoExpect . . . . . . . . . . . . .
4.2.2 TestWorks CAPBAK . . . . . . .
4.3 Output validation . . . . . . . . . . . . . .
4.3.1 TextTest . . . . . . . . . . . . . . .
4.3.2 ndiff . . . . . . . . . . . . . . . . .
4.3.3 Toldiff . . . . . . . . . . . . . . . .
4.3.4 numdiff . . . . . . . . . . . . . . .
4.4 Test Coverage . . . . . . . . . . . . . . . .
4.4.1 gcov . . . . . . . . . . . . . . . . .
4.4.2 Polyhedron plusFort - CVRANAL
4.4.3 FCAT . . . . . . . . . . . . . . . .
4.4.4 Cleanscape Grayboxx . . . . . . .
4.4.5 TestWorks/TCAT . . . . . . . . .
4.4.6 LDRA Testbed . . . . . . . . . . .
4.4.7 McCabe IQ . . . . . . . . . . . . .
4.5 Test Management and Automation . . . .
4.5.1 RTH . . . . . . . . . . . . . . . . .
4.5.2 TestLink . . . . . . . . . . . . . . .
4.5.3 QaTraq . . . . . . . . . . . . . . .
4.5.4 AutoTest . . . . . . . . . . . . . .
4.5.5 STAF . . . . . . . . . . . . . . . .
4.6 Build Management . . . . . . . . . . . . .
4.6.1 BuildBot . . . . . . . . . . . . . .
4.6.2 test-AutoBuild . . . . . . . . . . .
4.6.3 Parabuild . . . . . . . . . . . . . .
4.6.4 CruiseControl . . . . . . . . . . . .
4.6.5 BuildForge . . . . . . . . . . . . .
4.6.6 AEGIS . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
8
8
9
9
10
10
11
11
11
11
12
12
12
13
13
13
14
14
14
15
15
15
16
16
16
17
17
18
18
19
19
19
19
20
20
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Further reading
20
References
21
Introduction
Software has a long history of being used by the scientific community as a vehicle for performing world
class research. These software are usually written by a variety of developers, and often evolve over
time to incorporate new algorithms, models or features, or are refactored to take advantage of different
programming paradigms and cutting-edge technology.
Test suites that accompany these software play a crucial role in checking that the software functions
correctly and produces the expected results. A good set of tests serves as a safety net for developers,
ensuring that the software remains valid and internally consistent as changes are made. Additionally,
these tests allow for independent verification by the end users thus building confidence in the software.
Since most scientific software (and their test suites) are developed by domain experts rather then
Software Engineers, there is a tendency for emphasis to be on the represented model or calculation. Tests
are therefore designed around checking for acceptable results rather than discovering when or how the
software might fail.
Subsequently, this may lead to a situation where inadequate sets of tests lull developers into a false
state of confidences. A 100% passing rate for a test suite that exercises only 30% of the program code
could easily mislead developers and end-users. Similarly, tests that produce correct results for a small
subset of input may lead to incorrect assumptions that the results will remain valid for all other input.
This report documents the first steps in determining strategies for adopting high-payoff software
testing practices within the scientific software development. We look at well established methodologies
practiced by the Software Engineering community, as well as software testing tools that can accelerate
the process of building, running, and managing these solutions. Due to the predominance of Fortran
among scientific software projects, it is difficult for developers to take advantage of many of the available
testing tools designed mainly for the general software engineering community that has long shied away
from Fortran.
Chapter 3, which draws heavily from the 2004 edition of the text by Myers1 [2], provides a broad
overview of software testing concepts and methodology used in Software Engineering.
Chapter 4 presents a survey of software tools that can potentially be used to implement testing
solutions for scientific software. This list is quite extensive, and serves as a starting point for further
evaluation efforts.
This report is one of the outputs of the Software Engineering Support Programme (SESP).
1 The
original book, published in 1979, is often regarded as a seminal work on software testing.
Software Engineering Support Programme
The Software Engineering Support Programme (SESP) (http://www.sesp.cse.clrc.ac.uk/) is an EPSRC support activity to provide and encourage the use of up-to-date software engineering techniques and
tools in software development within computational science and engineering.
The main goals of this SES Programme are:
accelerate the introduction and widespread use of high-payoff software engineering practices and
technology by identifying, evaluating, and maturing promising or underused technology and practices;
maintain a long-term competency in software engineering and technology transition;
enable the UK academic community to make measured improvements in their software engineering
practices by working with them directly;
encourage the adoption and sustained use of standards of excellence for software engineering practice;
foster collaborations with other groups, in the UK, Europe and the US, that have an interest in the
applications of advanced software engineering techniques in computational science.
These goals will improve the level of software engineering practice within UK computational science
research groups. As a result, the software they develop will be of a higher quality; more easily developed
and maintained; more easily re-used within the community and be computationally more efficient.
The main thrust of the programme is to gather together processes and tools that will help improve
software engineering in computational science. This can be characterised by the Technology Watch,
Assessment and Evaluations process. Although the software engineering community has various very
formally defined processes of software assessment and evaluation a rather more pragmatic approach has
been defined for SESP.
Technology Watch In each elements of the SESP information is gathered on a regular basis and
a rolling update made to a Technology Report that would be made available to the community
through the SESP Web site.
Assessments The starting point of selecting a tool for use in anger is through paper assessment
using a basic requirements document. The detail of the assessment would clearly depend on the
area being addressed but a there will always be a collection of fundamental requirements such as
operating systems, supported languages etc. These paper assessments can identify tools for practical
evaluation and much of the material developed in the paper assessments added to the technology
watch reports.
Evaluations Through the assessment, various tools will be selected for more direct evaluation. They
would be used in a realistic context either by SESP staff or those involved in the CCP and HEC
programmes and their usefulness and effectiveness documented. Although in general the evaluations
would not be placed on critical paths within the CCP or HEC activities, these programmes provide
a considerable number of representative software packages that can be made the subject of an
evaluation. The evaluations would lead to detailed reports and if successful the deployment of the
tool or practice within the main stream.
At present the two major foci for the programme are on software quality assurance and transformation
of legacy software.
Overview of Software Testing
Software testing involves more than just running a program to see whether it works. A single test run
reveals nothing about the program other than the obvious fact that it can yield results for a particular
set of inputs. Software testing should be treated as an investigative exercise; one which systematically
uncovers different classes of errors within the code while demonstrating that the software behaves as
expected.
The developers concept of the definition and objectives of software testing plays a major role in
determining the efficacy of the activity. It influences the developers decision on what should be tested,
and judgement on what is considered a successful test.
For example, if the definition Software Testing is a process of proving that a program is bug free
were adopted, there would be a natural tendency for developers to subconsciously write fewer or less
destructive test cases with lower probabilities of breaking the program. Furthermore, the objective
that this definition implies is practically impossible to achieve. It takes only one failed test to prove the
existence of bugs, but requires an infinite amount of test cases to prove otherwise. Tests can only find
defects, not prove that there are none.
A similarly delusive definition would be Software Testing is a process of proving that a program
performs its intended functions. This line of thinking often leads to test cases that focuses only on
program behaviour that is inherently expected . However, programs that perform the right functions
when given a controlled set of inputs are still erroneous if they also produce unwanted side effects or
fail when given unexpected inputs. A complete test should check for both expected and unexpected
behaviours, using valid as well as invalid inputs.
Myers [2] aptly defines software testing as a process of executing a program with the intention of
finding errors. Using the analogy of a medical diagnosis, a successful investigation is one that seeks
and discovers a problem, rather than one that reveals nothing and provides a false sense of well-being.
Based on this definition, we establish that a good set of test cases should be one that has a high chance
of uncovering previously unknown errors, while a successful test run is one that discovers these errors.
In order to detect all possible errors within a program, exhaustive testing is required to exercise
all possible input and logical execution paths. Except for very trivial programs, this is economically
unfeasible if not impossible. Therefore, a practical goal for software testing would be to maximise the
probability of finding errors using a finite number of test cases, performed in minimum time with minimum
effort.
Section 3.2 presents several test design strategies that can be used to work towards this goal.
3.1
Stages of Software Testing
Figure 1 presents an illustration of the different phases of software development with a list of activities
that make up each phase. It is an extension of the V-model, and includes an additional Maintain loop
to cater for iterative software development models (e.g. evolutionary prototyping, staged delivery, etc.)
that may be more relevant to scientific application development.
The diagram is admittedly over-elaborate as it attempts to be all-encompassing; it is not meant to
describe wholly the development process of a particular software project, but instead provide a correlation between different activities that represent the building blocks of software development projects.
Developers may wish to consider only those activities relevant to their project, and from the diagram,
determine where the different software testing stages could be applied within their software development
process.
3.1.1
Design phase
The design phase represents a stream of activities where the software specifications are defined, starting
from a high level specification of requirements down to the detailed description of the implementation.
At each stage, an associated document is produced as well as the test criteria which reflect the
requirements specified in the document. If it is feasible, for instance in the case of acceptance tests based
on user requirements, the actual test cases should be written at this stage
Test criteria drawn up at the design phase would be based on an objective view of specifications,
resulting in a more complete and accurate representation of the requirements.
Figure 1: Extended V-Model which includes a Maintenance phase

3.1.2
Testing phase
The testing phase is made up of the different stages of testing, which reflects a bottom-up correspondence
with the levels in which software is designed and built.
Unit Testing :
Testing a code module in isolation, ensuring that it works correctly as specified by the detailed
design. Good unit tests assist in future refactoring of code, since they give assurance that the
modified code still works as expected and can therefore be included into the project.
Integration Testing :
Testing of communication and interaction between different code modules that are to be integrated.
Integration tests are defined based on the architectural design of the system, and provide confidence
that all modules can work together to achieve the functionalities specified in the design.
Code Coverage Analysis :
Determining the level of coverage for previous tests. If the level does not meet a predefined threshold,
the test cases should be extended until a satisfactory coverage level is attained. Since the coverage
of test cases depends on the actual code implementation, coverage has to be re-evaluated whenever
code changes to ensure that coverage level is maintained. Test coverage will be discussed further in
section 3.2.2.
System Testing :
Testing of system level requirements as stipulated by the software requirements specification. This
might include tests for performance, interoperability, portability, usability, installability, etc.
Acceptance Testing :
Testing of the final product against user requirements specification. User may refer to actual
4
end-users using the program, or in the case of prototypes or novel applications, the developers that
define what they are attempting to achieve.
There is a flow leading down from each test level back to the implementation phase. This represents
the fact that failed tests are followed by an implementation of a fix, and a re-execution of all tests. This
form of regression testing attempts to detect any new bugs that might have been introduced when the
code was modified.
While it does seem like a lot of work, there are tools that aid in managing and automating tests. Test
management and build management tools not only make it easier to run tests, they also provide other
useful functionalities such as e-mail notifications, report generation, and defect tracking. A list of these
tools are provided in section 4.5 and 4.6.
3.1.3
Maintenance phase
The maintenance phase begins the moment the software is released. As feedback and bug reports are
received, updates to the code are planned, and the required changes reflected in the documentation and
test cases. The process then flows back into the Implementation phase followed by the Testing phase
before the new version of the code is released.
3.1.4
Implementation phase
The implementation phase acts the link between all other phases, and represents all activities involved in
translating ideas and design into a working program. Activities that make up this phase are not limited
to the writing of programming code, but must also include other supporting components:
Unit Tests :
Any implementation or modification of code should be follow by a relevant unit test. This ensures
that any bugs that might be present are detected early, and can be easily traced and fixed.
Change Control :
It is not unusual to require changes to the design during the implementation stage. This might be
due to a change in requirements, or an oversight during the design phase. These changes should be
reflected in the design documents as well as the associated tests.
Quality Assurance :
All written code should be put through some form of Quality Assurance (QA) process. This might
include code conformance checking, static code analysis, build tests, memory leak detection, or even
peer review.
The execution of tests and QA tools can be automated using build management tools listed in section
4.6. Using build tools, a list of predefined actions can be executed whenever changes to the code are
detected, and notification sent out whenever a problem is found.
3.2
Test Design
This section briefly discusses two classes of strategies commonly used when designing software test cases.
These strategies provide a systematic approach towards creating test cases with higher chances of
discovering errors in a program and are oriented around achieving sufficient test coverage the black-box
approach is designed to attain good input-output data coverage, while the white-box approach focuses
instead on program logic coverage.
3.2.1
The Black Box approach
The Black Box method, sometimes referred to as Data-Driven or Functional testing, involves taking an
external perspective of the program units and ignoring the internal workings. Test cases are defined by
setting up a range of different inputs, and comparing the results of each run against a predefined list
of expected output. It is important that the expected output are predefined beforehand so as to avoid
erroneous (but seemingly plausible) results from being interpreted at first glance as being acceptable.
Initial test cases are often derived from the software specification, with each of the specified requirements translated to a set of expected input-output. While this method is useful in exposing unimplemented parts of the specification, it do not yet provide sufficient coverage of input data.
To weed out all errors, test cases would have to include combinations of not just expected and valid
inputs but all possible input. For example, in the case of a program reading from file a value representing age, a valid input may be a range of positive integers, but possible inputs would also include 0,
0.12XY ZA123, 0.2e12, 12637213232, empty strings, character strings, binary data, etc.
For most programs, this form of exhaustive input testing is not possible as it would involve an almost
infinite number of test cases. Instead, a compromise is made by choosing a subset of test cases that has
the highest probability of detecting the most errors.
The following are several methodologies used to select an effective subset of test cases (Examples and
detailed discussion on these methodologies are available in [2]).
Equivalence Partitioning :
The input domain is partitioned into a finite number of equivalence classes such that reasonable
assumptions can be made that a test of a representative value of each class is equivalent to that
of any other value in its class. Test cases can then be derived by gathering one value from each
partition.
Boundary-value analysis :
This method complements equivalence partitioning by concentrating on elements at the borders of
each class.
Cause-effect graphing :
Test cases are converted from rules in a decision table. This decision table is generated from a causeeffect graph which is a logical representation of the functionality that the program is attempting to
attain.
Error guessing :
Test cases are written based on a list of error-prone conditions. Generation of the list relies on an
understanding of the program implementation as well as the science represented by the program,
and thus depends largely on the knowledge, creativity and experience of the developer.
3.2.2
The White Box approach
The White Box approach, also known as Logic-Driven or Structural testing, uses an internal perspective
of the program units where test cases are designed from an examination of the program logic. In its
simplest form, the White Box method can be seen as an iterative design strategy driven by code coverage
analysis2 .
Using the White Box method, a testing solution can be designed along the lines of:
1. Decide on a reasonable coverage target.
2. Analyse coverage of current test cases. Initial test cases would ideally have been derived using the
Black Box approach.
3. If total coverage is below the predetermined target, study low coverage segments of the code
4. Identify and include test cases that can increase coverage
5. Re-execute tests, and repeat steps 2-4 until target coverage is achieved
A high coverage level of a test run would indicate that most of the logical paths within the program
have been traversed, which leads to a higher chance of exposing errors within the program. Coverage
level can therefore be used, to a certain extent, as a measure of the quality of the test cases.
The following is a non-exhaustive list of different measures of coverage, each of which is increasingly
more complete but harder to achieve. (Examples and detailed discussion on these methodologies are
available in [2]).
Statement Coverage :
Ensuring that every statement is executed at least once.
Decision Coverage :
Ensuring that every branch direction is traversed, and (for subroutines with multiple entry points)
that every entry point is invoked at least once.
2 Code coverage is a metric which describes the degree to which a program has been exercised, and can be determined
using dynamic analysis tools. Some of these tools are presented in section 4.4
Condition Coverage :
Ensuring that each condition in a branch takes on all possible outcomes at least once, and that
every entry point is invoked at least once.
Decision-condition Coverage :
Ensuring that both decision coverage and condition coverage conditions are satisfied.
Multiple-condition Coverage :
Ensuring that all possible combinations of condition outcomes in each decision are invoked at least
once.
White Box testing is very useful in ensuring that the test cases cover sufficient ground. It increases
the probability of exposing errors, and prevents the problem of pseudo success a situation where
developers are misled by a 100% passing rate when only a small portion of the program is tested.
Since developers would have to analyse and understand implementation code in order to derive test
cases, the White Box method comes with an added bonus of encouraging code inspection. Many errors
and issues (such as unnecessary or over-complicated code) could be identified during this stage even
before the tests are actually run.
As useful as it might be, White Box testing does have some limitations. It cannot show that the
program being tested meets its specifications; neither can it identify missing paths or data-dependent
problems. Therefore, it should be noted this method would not by itself produce a complete solution,
but should be used in conjunction with Black Box testing.
3.3
Deciding on a strategy
The methodologies discussed above are generic, and can be applied to any of the testing stages defined
in section 3.1.2. It is up to the developer to determine an overall strategy based on a combination of
methodologies tailored to suit specific circumstances. A good strategy would be one that can strike a
balance between the thoroughness of the tests and the resources invested in developing and executing the
tests.
An example of a test design strategy that might be drawn up by a developer would be:
1. Generate test cases based on all specification documents relevant to the code being tested. If the
specification is in the form of input-output conditions, start with the cause-effect graphing technique.
2. Supplement that with boundary-value analysis and equivalence partitioning. Ensure that both valid
and invalid conditions are covered.
3. Generate additional test cases based on error-guessing.
4. Add further test cases (based on analysing the code) until an 90% statement coverage is achieved3 .
3 An 85 95% statement coverage is usually a good initial target. However, one would ideally aim for full statement
coverage and a sufficient level of coverage for Decision and/or Condition
Available Tools
This chapter presents a survey of software testing tools currently available for software written in Fortran.
We have broken down the list into several categories:
Testing Framework
Capture and Playback
Output Validation
Test Coverage
Test Management and Automation
Build Management
The description for each tool were adapted from text available in their respective websites.
4.1
Testing Framework
Testing frameworks accelerate the testing process by providing developers with tools that assist in the
development and deployment of tests.
While each of the frameworks employ different approaches, all of them should provide the following:
Tools and libraries for writing test suites (a collection of test cases),
Mechanism for setting up and tearing down a testing runtime environment,
Standardised form or reporting and managing test results.
4.1.1
pFUnit
Available from http://sourceforge.net/projects/pfunit/

More info at http://opensource.gsfc.nasa.gov/projects/funit/pfunit.php
License
NASA Open Source Agreement (NOSA)
Description
The goals of the pFUnit project are to provide a shared mechanism for supporting unit testing within the
HPC community in the hope of encouraging best practices for development and maintenance of software.
In particular, pFUnit aims to be sufficiently minimal to encourage rapid adoption while still providing a
minimum threshold of functionality. By providing pFUnit as open source, we hope to leverage interest
from other groups to enhance portability and usability.
pFUnit is a Fortran analogue to various other xUnit testing frameworks which have been developed
within the software community, and is intended to enable test driven development (TDD) within the
scientific/technical programming community.
It was written (almost) entirely in standard conforming Fortran 95, and was developed using TDD
methodology. pFUnit is bundled with an extensive set of self-tests which are intended to evolve along
with the primary package.
pFUnit includes scripts which can conveniently wrap user-written tests into test suites and assemble
those suites into an executable. The lack of true object-orientation and reflection within Fortran necessitates this approach. Nonetheless, once added to a developer build system adding/running additional
tests requires minimal effort. The executable itself is, at least for the moment, command-line driven.
If all tests pass, then a simple summary of the number of tests run is returned. If some tests failed, a
summary of which tests failed and any associated messages is returned to standard output.
Features that will be of particular interest to the developer of scientific applications include:
Extensive sets of assert routines for floating point, including support for single and double precision,
multidimensional arrays, and various means of expressing tolerances.
Ability to launch MPI tests and report results back as a single test - an essential feature for high-end
computing associated with weather and climate modelling.
Ability to repeat tests across a complex high-dimensional parameter space. The need for this
capability arises when multiple input parameters strongly interact within a subsystem. The ability
to balance performance concerns against the need to adequately sample the possibilities is very
useful. Failing parameter tests report back which combinations of parameters resulted in failures.
4.1.2
fUnit
Available from http://funit.rubyforge.org/

License
NASA Open Source Agreement (NOSA)
Description
FUnit is a unit testing framework for Fortran modules.
Unit tests are written in Fortran fragments that use a small set of testing-specific keywords and
functions. FUnit transforms these fragments into valid Fortran code and compiles, links, and runs them
against the module under test.
FUnit is opinionated software which values convention over configuration. Specifically, fUnit requires
a Fortran 95 compiler, it only supports testing routines contained in modules, it requires tests to be
stored along side the code under test, and it requires that you follow a specific naming rule for test files.
The requirements for using fUnit are :
A Fortran 90/95/2003 compiler (set via FC environment variable)
The Ruby language with the RubyGems package manager
4.1.3
DejaGNU
Available from http://www.gnu.org/software/dejagnu/

License
GNU General Public License (GPL)
Description
DejaGnu is a framework for testing other programs. Its purpose is to provide a single front end for all
tests. Think of it as a custom library of Tcl procedures crafted to support writing a test harness. A Test
Harness is the testing infrastructure that is created to support a specific program or tool. Each program
can have multiple testsuites, all supported by a single test harness. DejaGnu is written in Expect, which
in turn uses Tcl (Tool command language).
DejaGnu offers several advantages for testing:
The flexibility and consistency of the DejaGnu framework make it easy to write tests for any
program.
DejaGnu provides a layer of abstraction which allows you to write tests that are portable to any
host or target where a program must be tested. For instance, a test for GDB can run (from any
Unix based host) on any target architecture that DejaGnu supports. Currently DejaGnu runs tests
on several single board computers, whose operating software ranges from just a boot monitor to a
full-fledged, Unix-like realtime OS.
All tests have the same output format. This makes it easy to integrate testing into other software
development processes.
Using Tcl and Expect, its easy to create wrappers for existing testsuites. By incorporating existing
tests under DejaGnu, its easier to have a single set of report analyse programs..
DejaGnu is written in Expect, which in turn uses Tcl (Tool command language).
Running tests requires two things: the testing framework and the testsuites themselves. Tests are
usually written in Expect using Tcl, but you can also use a Tcl script to run a testsuite that is not based
on Expect.
4.1.4
QMTest
Available from http://www.codesourcery.com/qmtest/

License
Description
QMTest is a cost-effective general purpose testing solution that can be used to implement a robust, easyto-use testing process. QMTest runs on Windows and on most UNIX-like operating systems including
GNU/Linux.
QMTests extensible architecture allows it to handle a wide range of application domains: everything
from compilers to graphical user interfaces to web-based applications. QMTest can easily compare test
results to known-good baselines, making analysing test results far simpler. And, because QMTest runs
on virtually all operating systems, you can use it with your entire product line.
4.1.5
Cleanscape Grayboxx
Vendor
Cleanscape Software International
http://www.cleanscape.net/products/grayboxx/index.html
Description
A complete software life-cycle testing toolset developed for software written in C, Fortran, Ada, and
Assembly. Grayboxx provides a complete software testing solution that verifies functional and structural
performance requirements for mission critical applications. Grayboxx automatically conducts the following test methodologies: Blackbox Testing, Whitebox Testing, Regression Testing, Assertion Testing, and
Mutation Testing.
Grayboxx speeds the development process by allowing developers and test engineers to automatically:
Generate test cases
Conduct coverage analysis with complexity metrics
Conduct unit performance testing with no probe insertions
Generate test stubs
Generate test harnesses
Execute tests
Prepare modules
Verify results
Grayboxx also allows for both full and partial regression testing, allowing the tester to run the same
test more than once or to name the test titles to run with a subset of test cases.
10
4.2
Capture and Playback
Capture and Playback tools provide an alternative mechanism for testing applications. Instead of having
developers write test cases using scripts or code, these tools enable test cases to be created through the
recording of input during a program execution. These input can then be played back during the test run,
and the output compared to an expected input-output value.
These tools are independent of the programming language used in the program, and are very useful
for testing interactive programs (command-line interfaces or graphical user interfaces). These tools may
be a quick solution to preparing functional or acceptance tests.
4.2.1
AutoExpect
Available from : http://expect.nist.gov/

License
None (public domain)
Description
Expect is a tool for automating interactive programs. It is possible to make very sophisticated Expect
scripts. For example, different patterns can be expected simultaneously either from one or many processes,
with different actions in each case. Traditional control structures such as if/then/ else, procedures, and
recursion are available.
Expects language facilities are provided by Tcl, a very traditional scripting language. Traditionally,
users write Expect scripts by studying the interaction to be automated and writing the corresponding
Expect commands to perform the interaction. Using Autoexpect, this stage could be automated.
Autoexpect, which is part of the Expect distribution, is a program which watches a user interacting
with another program and creates an Expect script that reproduces the interactions.
For testing an interactive application, the Expect script generated by AutoExpect can be modified
and repeatedly played back with different input values.
4.2.2
TestWorks CAPBAK
Vendor
Software Research, Inc. (http://www.soft.com/Products/stwindex.html)
Description
CAPBAK is a capture/playback tool system which allows the user to record mouse movements, keyboard
activities, widget calls and verification information into a test script language for later use. CAPBAK
supports automatic synchronisation feature that handle minor application changes and time-sensitive
operations.
Captured images and/or character patterns provide baselines against which future runs of the tests
are compared. CAPBAK/Xs automatic output synchronisation ensures reliable playback, allowing tests
to be run unsupervised as often as required.
With the Xvirtual display capability, test runs can be executed in the background, freeing the screen
for other activities. This capability can also be used to invoke a single application multiple times on the
same workstation, thus allowing for load testing in a client/server environment.
Used in conjunction with its TestWorks/Regression companion tools, EXDIFF and SMARTS, the
regression testing process can be completely automatic. The SMARTS test management system organises
CAPBAK/Xs test sessions into a hierarchical structure for execution individually or as a part of a test
suite. This process is based on the verification criteria selected for each test. Discrepancies are reported
by SMARTS for further analysis. Extraneous discrepancies can be masked during the comparison process
via EXDIFF. Following test execution, SMARTS logs the test statistics and generates PASS/FAIL results
into various standard reports.
4.3
Output validation
Output validation tools provide a convenient mechanism to compare results from a test run to that of
a gold standard or acceptable value. They automate the tedious task of performing result validation
11
especially for programs with lots of output, or for multiple runs with different values.
Tools such as ndiff and numdiff are particularly useful for scientific software as they can be configured with acceptable error tolerances when performing comparisons of numerical data. This avoids tiny
deviations in floating point data from being unnecessarily flagged as errors.
4.3.1
TextTest
Available from: http://sourceforge.net/projects/texttest

Project website: http://texttest.carmen.se/
License
OSI-Approved Open Source
Description
TextTest works via comparing plain text logged by programs with a previous gold standard version of
that text.
The focus is around testing a particular executable program with a variety of inputs. To start with,
a plain text configuration file is created that tells TextTest about your program, how to run it, and how
to test it. Tests (and test suites) are then defined entirely using plain text files in a directory structure.
A test is defined partly by the expected files and their contents that should be produced, and partly
by the input to provide, which can consist any or all of:
Options to be provided on the command line
A file to be redirected to standard input
Environment variables that should be set
A sequence of use-case actions to be performed on a GUI
Any output at all can be compared, so long as it is plain text, or can be converted to it.
4.3.2
ndiff
Available from: http://www.math.utah.edu/~beebe/software/ndiff/

License
Description
Suppose that you have just run the same numerical program in two different environments, perhaps
different compilers on the same system, or on different CPU architectures or operating systems. You run
diff on the two program output text files, but there are thousands of differences reported. Is the program
behaving acceptably, or are there real errors that you must deal with, perhaps due to architectural
assumptions in your code, or to numerical instabilities in your algorithms, or errors in compiler-generated
code, or errors or inaccuracies in run-time libraries?
ndiff is a very useful tool for solving problems like that. Simply put, it assumes that you have two text
files containing numerical values, and the two files are expected to be identical, or at least numerically
similar. ndiff allows you to specify absolute and/or relative error tolerances for differences between
numerical values in the two files, and then reports only the lines with values exceeding those tolerances.
It also tells you by how much they differ.
4.3.3
Toldiff
Available from: http://sourceforge.net/projects/toldiff/

License
MIT License
12
Description
Toldiff is a diff tool that allows tolerable (insignificant) differences between two files to be suppressed
showing only the important ones. The tolerable differences are recorded running the tool with an appropriate command line flag.
4.3.4
numdiff
Available from: http://www.nongnu.org/numdiff/

License
Description
Numdiff is a little program that can be used to compare putatively similar files line by line and field
by field, ignoring small numeric differences or/and different numeric formats. Equivalently, Numdiff is
a program with the capability to appropriately compare files containing numerical fields (and not only).
By default, Numdiff assumes the fields are separated by white spaces (blanks, horizontal tabulations and
newlines), but the user can also specify its list of separators.
When you compare a couple of such files, what you want to obtain usually is a list of the numerical
fields in the second file which numerically differ from the corresponding fields in the first file. Well known
tools like diff, cmp or wdiff can not be used to this purpose: they can not recognise whether a difference
between two numerical fields is only due to the notation or is actually a difference of numerical values.
Moreover, you could also want to ignore differences in numerical values as long as they do not overcome
a certain threshold. In other words, you could desire to neglect all small numerical differences too.
4.4
Test Coverage
Code coverage of test suites can be determined using dynamic analysis tools, some of which are listed
in this section. This information is useful when determining the thoroughness of the test cases, and are
often used when designing test cases using the White Box method (see section 3.2.2).
Using code coverage tools, developers can determine:
Cold spots Parts of the code that are never used, or just not used by the test cases.
Hot spots Parts of the code that are used frequently.
New test cases To exercise a part of the code not already tested.
This kind of testing is really a test of the completeness of the test cases, i.e. do they exercise all parts
of the code but it also gives indirect testing of the code itself. If all cases had the same cold spot(s) then
maybe that code can be removed, or if there is a common hot spot then this is an area to study in detail
to find ways of making it more efficient.
4.4.1
gcov
Available from: http://sourceforge.net/project/showfiles.php?group_id=3382

License
Description
gcov is a test coverage program. Use it in concert with GNU CC to analyse your programs to help create
more efficient, faster running code. You can use gcov as a profiling tool to help discover where your
optimisation efforts will best affect your code. You can also use gcov along with the other profiling tool,
gprof, to assess which parts of your code use the greatest amount of computing time.
Profiling tools help you analyse your codes performance. Using a profiler such as gcov or gprof, you
can find out some basic performance statistics, such as:
how often each line of code executes
13
what lines of code are actually executed

how much computing time each section of code uses
Once you know these things about how your code works when compiled, you can look at each module to
see which modules should be optimized. gcov helps you determine where to work on optimization.
Software developers also use coverage testing in concert with testsuites, to make sure software is
actually good enough for a release. Testsuites can verify that a program works as expected; a coverage
program tests to see how much of the program is exercised by the testsuite. Developers can then determine
what kinds of test cases need to be added to the testsuites to create both better testing and a better final
product.
Output from gcov can be visualised using tools such as ggcov (http://ggcov.sourceforge.net/)
and lcov (http://ltp.sourceforge.net/coverage/lcov.php).
4.4.2
Polyhedron plusFort - CVRANAL
Vendor
Polyhedron Software
http://www.polyhedron.co.uk/pf/pfqa.html#coverage
Description
The plusFort package includes CVRANAL, a coverage analysis facility that places probes into Fortran
source code which allow users to monitor the effectiveness of testing. At the end of each run, the probes
update the coverage statistics for each source file. This data may be analysed at any time using the
CVRANAL tool. CVRANAL identifies untested code blocks, and execution hot-spots.
In addition, CVRANAL can annotate your source code as shown below. The annotations are comments and do not affect the validity of the source code.
4.4.3
FCAT
Available from http://www.dl.ac.uk/TCSC/UKHEC/FCAT/

Description
FACT is similar to CRVANAL in that it reports the execution count for each line of executable source
code.
FCAT (FORTRAN Coverage Analysis Tool) is used for the Coverage Analysis of FORTRAN codes.
finding out cold-spot in Fortran codes (the part of the codes that are never executed), and flags
these parts line-by-line.
finding out hot-spot in Fortran codes (the part of the codes that are most frequently executed),
and gives a line by line profile.
It is designed to working mainly with F90/F95, even through it also works with fixed formatted
FORTRAN, thus F77.
FCAT offers some facility for the coverage analysis of parallel codes. It treats a line as being executed
if at least one processor has executed it. The counter for the line is taken as the maximum of the number
of times this line has been executed over all processors.
4.4.4
Cleanscape Grayboxx
Vendor
Cleanscape Software International
http://www.cleanscape.net/products/grayboxx/index.html
Description
A complete software life-cycle testing toolset that includes coverage analysis with complexity metrics. It
can perform the following coverage functions
14
Measure test effectiveness and reliability of testing by analysing application source code
Set up test cases and measures their efficiency
Consolidate results of test coverage measurements for several scenarios or during a test campaign
Enable effective visualisation of covered and uncovered source code
4.4.5
TestWorks/TCAT
Vendor
Software Research, Inc.
http://www.soft.com/Products/stwindex.html
Description
TCAT and S-TCAT, a branch-level unit-test and system test coverage analysis tool, provides branch and
call-pair coverage for F77 and Ada programs. TCAT and S-TCAT measure the number of times each
segment or function-call pair is exercised.
C1 expresses test effectiveness as the percentage of every segment exercised in a program by a test
suite, relative to the number of such segments existing in the system.
S1 expresses test effectiveness as the percentage of every function-call exercised in a program by a test
suite, relative to the number of such function-calls existing in the system.
TCAT and S-TCAT instrument the application by placing markers at each segment or function-call.
When test cases have been run against the instrumented code, TCAT/S-TCAT collects and stores test
coverage data in a tracefile. TCAT/S-TCAT then extracts this information to create coverage reports
indicating which calls remain untested or frequently tested, and which test cases duplicate coverage.
TCAT/S-TCAT also creates archive files containing cumulative test information.
The instrumentation process also generates call-trees that identify a programs modules and represent
the caller-callee hierarchical structure (as well as subtrees of caller-callee dependencies) within a program.
Using optional user annotation and or supplied colour annotation, the call-tree shows each functions
level of interface exercise. When a function-call coverage values are low the user can navigate directly to
the corresponding source code. The call-trees can also generate directed graphs depicting the control-flow
structure for individual modules.
4.4.6
LDRA Testbed
Vendor
LDRA
http://www.ldra.co.uk/testbed.asp
Description
LDRA Testbeds Dynamic Analysis tool provides coverage analysis at the following levels:
Statement Coverage
Branch/Decision Coverage
LCSAJ Coverage
MC/DC Coverage
Dynamic Data Flow Coverage
4.4.7
McCabe IQ
Vendor
McCabe Software
http://www.mccabe.com/iq_test.htm
15
Description
McCabe IQ provides comprehensive test / code coverage to focus, monitor, and document software testing
processes. Using industry-standard testing methods and advanced dynamic analysis techniques, McCabe
IQ accurately assesses the thoroughness of your testing and aids in gauging the time and resources needed
to ensure a well-tested application.
McCabe IQ provides multiple levels of test coverage at the unit, integration, regression test phases
including module, lines of code, branch, path, Boolean (MC/DC for DO-178B test verification), data,
class (OO), and architectural coverages.
4.5
Test Management and Automation
As the number of test cases for each project grows, it becomes increasingly important for these tests to
be organised and automated as much as possible. Most test management and automation tools would
provide the following functionalities:
Organisation of information such as software requirements, test plans, and test cases.
Test results tracking.
Automated execution of tests. Test runs can be periodic, or triggered by events such as changes to
the source tree.
Reports and statistics generation.
4.5.1
RTH
Available from: http://www.rth-is-quality.com

License
Description
rth is a web-based tool designed to manage requirements, tests, test results, and defects throughout the
application life cycle. The tool provides a structured approach to software testing and increases the visibility of the testing process by creating a common repository for all test assets including requirements,
test cases, test plans, and test results. Regardless of their geographic location, rth allows testers, developers, business analysts, and managers to monitor and gauge application readiness. The tool includes
modules for requirements management, test planning, test execution, defect tracking, and reporting.
Benefits of RTH include:
Working in remote locations is no longer a problem. View the status of your project on the web
View progress of requirements, test execution, and bug status in real-time
All documents (requirements, tests, test plans, supporting documents) are stored under version
control
Store record or file-based requirements based on your reporting needs
Test Tool agnostic! Take advantage of test automation with three simple functions that allow you
to write automated test results to rth
Post and report on both manual and automated test results
4.5.2
TestLink
Available from: http://testlink.org

License
16
Description
TestLink is a open source web based Test Management and test Execution system. which allow quality
assurance teams to create and manage their test cases as well as organise them into test plans. These
test plans allow team members to execute test cases and track test results dynamically, generate reports,
trace software requirements, prioritise and assign.
The tool is based on PHP, MySQL, and includes several other open source tools. It also supports Bug
tracking systems as is Bugzilla or Mantis.
In short, TestLink allow users to:
Collect and organise test cases dynamically
Track results and metrics associated with test execution
Track specific information about individual tests
Capture and report details to assist in conducting a more thorough testing process
Customise TestLink to fit requirements and processes
4.5.3
QaTraq
Available from: http://www.testmanagement.com

License
GNU General Public License (GPL) with options for commercial upgrades.
Professional upgrades include additional modules for extended graphical reporting capabilities and
extensible scripting functionalities.
Vendor
Traq Software Ltd.
Description
QaTraq Test Case Management Tool allows you to consolidate the manual and functional software testing
process. With one functional software testing management tool we give you the control to automate your
own techniques and strategies to track your testing, from the planning stages right through to the test
completion reporting. From communicating your test plans to managing the functional coverage of your
software testing QaTraq can help you gain control of the whole manual and functional software testing
process without changing your own strategies or techniques.
Amongst other things QAtraq can provide you with:
Improved co-ordination between testers, team leaders and managers
A repository of your entire manual testing progress
A knowledge base of technical testing to share amongst a test team
A formal channel for developers and testers to suggest tests
Accurate tracking of your functional software testing
Instant reports based on test cases created and executed
Statistics listing the testing which is most effective
Control of your Manual and functional software testing.
4.5.4
AutoTest
Available from http://eiffelzone.com/esd/tstudio/
17
License
Eiffel Forum License, version 2
http://www.opensource.org/licenses/ver2_eiffel.php
Description
AutoTest (formerly TestStudio) is a fully automatic testing tool based on Design by Contract.
Contracts are a valuable source of information regarding the intended semantics of the software. The
information that contracts (preconditions, postconditions, class invariants, loop variants and invariants,
and check instructions) provide can be used to check whether the software fulfils its intended purpose.
By checking that the software respects its contracts, we can ascertain its validity. Therefore, contracts
provide the basis for automation of the testing process.
AutoTest allows the user to generate, compile and run tests at the push of a button.
4.5.5
STAF
Available from: http://staf.sourceforge.net

License
Common Public License (CPL) V1.0.
http://www.opensource.org/licenses/cpl1.0.php
Description
The Software Testing Automation Framework (STAF) is an open source, multi-platform, multi-language
framework designed around the idea of reusable components, called services (such as process invocation,
resource management, logging, and monitoring). STAF removes the tedium of building an automation
infrastructure, thus enabling you to focus on building your automation solution. The STAF framework
provides the foundation upon which to build higher level solutions, and provides a pluggable approach
supported across a large variety of platforms and languages.
STAF can be leveraged to help solve common industry problems, such as more frequent product
cycles, less preparation time, reduced testing time, more platform choices, more programming language
choices, and increased National Language requirements. STAF can help in these areas since it is a proven
and mature technology, promotes automation and reuse, has broad platform and language support, and
provides a common infrastructure across teams.
STAX is an execution engine which can help you thoroughly automate the distribution, execution,
and results analysis of your testcases. STAX builds on top of three existing technologies, STAF, XML,
and Python, to place great automation power in the hands of testers. STAX also provides a powerful GUI
monitoring application which allows you to interact with and monitor the progress of your jobs. Some
of the main features of STAX are: support for parallel execution, user-defined granularity of execution
control, support for nested testcases, the ability to control the length of execution time, the ability to
import modules at run-time, support for existing Python and Java modules and packages, and the ability
to extend both the STAX language as well as the GUI monitoring application. Using these capabilities,
you can build sophisticated scripts to automate your entire test environment, while ensuring maximum
efficiency and control.
Other STAF services are also provided to help you to create an end-to-end automation solution. By
using these services in your test cases and automated solutions, you can develop more robust, dynamic
test cases and test environments.
4.6
Build Management
Build management systems automate the update-compile-test cycle of a software project. Changes to
the source code would trigger a rebuild of the application and cause the tests to be executed. This allows
developers to get immediate feedback if an error occurs, and ensures that problems are detected as early
as possible.
Other tasks, such as software quality assurance analysis, can be included within the list of actions to
be performed. This ensures that the updated source code is not only correct, but is also of good quality
and adheres to standards defined for the project.
18
4.6.1
BuildBot
Available from: http://buildbot.sourceforge.net

License
Description
The BuildBot is a system to automate the compile/test cycle required by most software projects to validate
code changes. By automatically rebuilding and testing the tree each time something has changed, build
problems are pinpointed quickly, before other developers are inconvenienced by the failure. The guilty
developer can be identified and harassed without human intervention. By running the builds on a variety
of platforms, developers who do not have the facilities to test their changes everywhere before checkin
will at least know shortly afterwards whether they have broken the build or not. Warning counts, lint
checks, image size, compile time, and other build parameters can be tracked over time, are more visible,
and are therefore easier to improve.
The overall goal is to reduce tree breakage and provide a platform to run tests or code-quality checks
that are too annoying or pedantic for any human to waste their time with. Developers get immediate
(and potentially public) feedback about their changes, encouraging them to be more careful about testing
before checkin.
4.6.2
test-AutoBuild
Available from: http://www.autobuild.org/

License
Description
Test-AutoBuild is a framework for performing continuous, unattended, automated software builds. The
idea of Test-AutoBuild is to automate the building of a projects complete software stack on a pristine
system from the high level applications, through the libraries and right down to the smallest part of the
toolchain.
4.6.3
Parabuild
Vendor
Viewtier Systems
http://www.viewtier.com/products/parabuild/
Description
Parabuild is a software build management server that helps software teams and organisations reduce risks
of project failures by providing practically unbreakable daily builds and continuous integration builds.
Parabuild features an effortless installation process and easy overall use, multi-platform remote builds,
fast Web user interface, a wide set of supported version control, and issue tracking systems.
4.6.4
CruiseControl
Available from: http://cruisecontrol.sourceforge.net

License
BSD-style License
http://www.opensource.org/licenses/bsd-license.php
19
Description
CruiseControl is composed of 2 main modules:
the build loop: core of the system, it triggers build cycles then notifies various listeners (users)
using various publishing techniques. The trigger can be internal (scheduled or upon changes in a
SCM) or external. It is configured in a xml file which maps the build cycles to certain tasks, thanks
to a system of plugins. Depending on configuration, it may produce build artefacts.
the reporting allows the users to browse the results of the builds and access the artefacts
4.6.5
BuildForge
Vendor
Recently acquired by IBM and incorporated into IBMs Rational Software suite.
http://www.buildforge.com
Description
IBM Rational Build Forge provides complete build and release process management through an open
framework that helps development teams standardise and automate tasks and share information. Our
products can help clients accelerate software delivery, improve software quality, as well as meet audit and
compliance mandates.
4.6.6
AEGIS
Available from: http://aegis.sourceforge.net/

License
Description
Aegis is a transaction-based software configuration management system. It provides a framework within
which a team of developers may work on many changes to a program independently, and Aegis coordinates integrating these changes back into the master source of the program, with as little disruption
as possible.
While Aegis is not a build management system, we included it within this section as it can be used
to the same effect of ensuring that code passes tests (build, testing, QA) before being merged into main
source tree.
Further reading
1. B. Kleb & B. Wood, Computational Simulations and the Scientific Method, NASA Langley Research
Center, (2005).
2. A.H. Watson & T.J. McCabe, Structured Testing: A Testing Methodology Using the Cyclomatic
Complexity Metric, National Institute of Standards and Technology, (1996).
3. G. Dodig-Crnkovic, Scientific Methods in Computer Science, Department of Computer Science,
Mlardalen University, (2002)
4. E. Dustin, Effective Software Testing: 50 Specific ways to improve your testing, Addison Wesley
Professionals, (2002).
5. B. Beizer, Software Testing Techniques, Van Nostrand Reinhold Co., (1990).
6. M. Fewster and D. Graham, Software Test Automation: Effective use of test execution tools,
Addison-Wesley, (1999).
7. S.M. Baxter, S.W. Day, J.S. Fetrow & S.J. Reisinger, Scientific Software Development Is Not an
Oxymoron, PLoS Comput Biol 2(9): e87, (2006).
20
8. D. Libes, How to Avoid Learning Expect or Automating Automating Interactive Programs, Proceedings of the Tenth USENIX System Administration Conference (LISA X), (1996).
9. S. Cornett, Code Coverage Analysis, Bullseye Testing Technology. http://www.bullseye.com/
coverage.html
10. W.R. Bush, J.D. Pincus & D.J. Sielaff, A Static Analyzer for Finding Dynamic Programming Errors,
Intrinsa Corporation, (2000).
References
[1] D.J. Worth & C. Greenough, A Survey of Software Tools for Computational Science, Technical Report
RAL-TR-2006-011, CCLRC Rutherford Appleton Laboratory (2006).
[2] G.J. Myers, The Art of Software Testing, 2nd Edition, John Wiley & Sons inc., (2004).
21

A Survey of Software Testing Tools For Computational Science

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Survey of Software Testing Tools For Computational Science

Uploaded by

Copyright:

Available Formats

RAL-TR-2007-010

A Survey of Software Testing Tools for

Email: L.S.Chin@rl.ac.uk, D.J.Worth@rl.ac.uk, or C.Greenough@rl.ac.uk

Software Engineering Group

Science and Technology Facilites Council

STFC e-reports are available online at: http://epubs.cclrc.ac.uk

2 Software Engineering Support Programme

3 Overview of Software Testing

Software Engineering Support Programme

Overview of Software Testing

Stages of Software Testing

Figure 1: Extended V-Model which includes a Maintenance phase

The Black Box approach

The White Box approach

Available from http://sourceforge.net/projects/pfunit/

Available from http://funit.rubyforge.org/

Available from http://www.gnu.org/software/dejagnu/

Available from http://www.codesourcery.com/qmtest/

Capture and Playback

Available from : http://expect.nist.gov/

Available from: http://sourceforge.net/projects/texttest

Available from: http://www.math.utah.edu/~beebe/software/ndiff/

Available from: http://sourceforge.net/projects/toldiff/

Available from: http://www.nongnu.org/numdiff/

Available from: http://sourceforge.net/project/showfiles.php?group_id=3382

what lines of code are actually executed

Polyhedron plusFort - CVRANAL

Available from http://www.dl.ac.uk/TCSC/UKHEC/FCAT/

Test Management and Automation

Available from: http://www.rth-is-quality.com

Available from: http://testlink.org

Available from: http://www.testmanagement.com

Available from http://eiffelzone.com/esd/tstudio/

Available from: http://staf.sourceforge.net

Available from: http://buildbot.sourceforge.net

Available from: http://www.autobuild.org/

Available from: http://cruisecontrol.sourceforge.net

Available from: http://aegis.sourceforge.net/

You might also like