You are on page 1of 4

Software Failures And Their Root Causes

Vinayak
Abstract This document gives description about the software failures and their root causes. The process of managing problems

learn into training and process changes so that the defects wont occur again. and defects includes a large number of dif-culties and There are five steps: challenges but has been given little consideration in software 1. Extend defect data collection to include rootengineering research. The research work of this thesis is cause information. Start shifting from reactive responses to defects toward proactive responses. organized around four goals. The rst goal is to study software 2. Do failure analysis on representative quality assurance methods that can be used to detect defects. The organization-wide defect data. Failure analysis is second goal is to identify the difculties that organizations have the evaluation of defect patterns to learn process or in managing software defects and which improvements are product weaknesses. needed to existing defect management models. The third goal is 3. Do root-cause analysis to help decide what to study the concepts of service-oriented problem management. changes must be made. Root-cause analysis is a The fourth goal is to study challenges and difculties of service- group reasoning process applied to defect information to develop organizational oriented problem management methods. understanding of the causes of a particular class of defects. Keywords Software defects, Software failures. 4. Apply what is learned to train people and to change development and maintenance processes. 5. Evolve failure analysis and root-cause analysis I. INTRODUCTION Software problems have become a part of our to an effective continuous process improvement daily life. During a working day and free time we process. face numerous software problems, including III.CAUSES OF SOFTWARE FAILURES application failures, security bugs, errors in user Due to the high number of software failures, a documentation, poor usability, availability and lot of companies and individuals have done a lot of performance problems associated with IT services. research into the causes of these failures. The Problems such as these are very common because following are some of the causes they have found: the IT systems are used everywhere. The National Institute of Standards and Lack of communication between users and Technology has estimated that software defects and designers resulting in a confusing specification problems annually cost 59.5 billions the U.S. Incorrect documentation economy. The rework in software projects (problem The complexity of the software. resolution and bug xes) leads to higher software Programming errors development and maintenance costs and higher Frequent changes in requirements: this can lead prices for IT services and products. In addition to to lack of enthusiasm from Engineers and also if the increasing number of problems and defects, the there are a lot of changes, it might be very quality of the process of managing problems and difficult to incorporate the new change which is defects needs to be improved. being asked for with what has already been II. EVALUATION OF SOFTWARE DEFECTS done and the complexity of handling this changes may result in errors One useful way to evaluate software defects is to transfer process learning from individuals to Time pressure organizations. It includes not only analysing Individual egos: Some designers look at projects and think it is very easy to do and so software defects but also brainstorming the root during the design process, they could miss out causes of those defects and incorporating what we important codes or they dont read the

specifications properly and they dont design according to the specifications Poorly documented codes. Software development tools: some libraries, scripting tools, compilers often introduce their own bugs. Lack of resources. Inaccurate estimates of needed resources Poor reporting of project status Poor project management: Most projects start off with small budgets and then the designers or developers have to make up for the short fall and then they have to increase productivity and thus reducing the scope of the effort and taking risky shortcuts. Commercial pressures Stakeholder politics Cost of project Lack of training to use the software
IV. SOFTWARE SYSTEM FAILURES

in the airport underestimated the completion time of the software project and a lot of poor planning was discovered. The lesson learned from this is that there should always be proper testing of a system before it is deployed into operation, also we see that programming errors are not only the major causes of software failures but also lack of training, communication, and time constraints.

The following are a number of software failures that have occurred, with in depth explanation of a few of them:

Heathrow Terminal 5 luggage system: The errors occurred because there was a delay in the completion of building programme which led to rushed testing and therefore some members of staff did not get the opportunity to be trained efficiently to use the software and so when the system was put in operation, a lot of passengers were prevented from checking in their luggage and when airplanes landed at the airport, a lot of passengers were unable to retrieve their luggage. This had a very big impact on operations in the airport and it is still not clear who was at fault as British Airport authorities (BAA) were in charge of the contracts with the software suppliers and they were also responsible for the building programme which was delayed and thus resulted in rushed testing so they took most of the blame for the failure but British Airways (BA) was in charge of training staff to use the system correctly. From research conducted it was found that the management

Ariane 5 Explosion: The Ariane 5 was a European rocket which was used to launch commercial payloads to orbital earth. It was built as a successor to the Ariane 4 launchers which were successful in their launches but the Ariane 5 could carry a heavier payload. The Ariane 5 rocket was on its first voyage after about 10 years of

o o o o o

development and 7 billion invested into its design and construction when after only about 35 seconds of successful lift off, the rocket lost control due to incorrect signals which were sent to the engines and these imposed a lot of stress on the rocket which caused it to break up and was finally destroyed by ground controllers. A 20-man team which was split into 5 project teams was set up to inquire into the failure of the rocket. They found out that the weather conditions at the lift off site was good for take off and there was no possibility of lightening as the strength of the electric field at the launch site was negligible; this led to them enquiring about the operational facilities of the launcher. It was later found that the failure was due to specification and design errors in the software of the initial reference system. The engineers tried to convert a 64-bit floating point number relating to horizontal velocity of the rocket to a 16-bit signed integer; this number was greater than 32,768 which is the maximum integer that is storable by a 16-bit signed integer and therefore the conversion failed resulting in the failure of the software which resulted in the explosion of the rocket. The lesson learned here are: An efficient software quality testing should be put in place to avoid failure of system There should be no default exception handling response in systems that have no backup state All coding should be checked before putting the system out for use Use real equipments and not simulations Critical systems should be designed to avoid a single point of failure

Chemical bank, New York: the company was founded in 1823 as the New York Chemical manufacturing Company in Greenwich Village. Was used to produce chemicals as well as medicines, paints and dyes. It was amended in 1824 to allow banking practices and became known as Chemical bank in 1844. In 1994, bank

account holders discovered that money was being taken out of their accounts and made a formal report to the bank asking why they were being charged. The bank then had to research into the fault and found out that the error occurred due to a fault in the programming code. It was estimated that about $15 million was taken out of the account of about 100, 000 customers. The lessons that can be learned here are: o Employ professional programmer to reduce amount of errors in the coding of the software system o User and programmer should communicate frequently to check that the specifications are being met. London stock Exchange: A software system named the Taurus was set up to change the London stock exchange from paper communication to an automated system. The system failed because there was no ideal specification given to the developers so with different ideas of people being merged into one system, the software developed an error which led to its failure and caused damages of about 500 million. The lesson learned here are: o Users should know what they want before they ask a developer to design a software o There should be proper communication between users and developers. London Ambulance Service: The London ambulance service computer aided dispatch (LASCAD) was set to be developed in 1992 to be used in dispatching an ambulance service to wherever it was needed using new computer technologies instead of the manual system that was already in use during the mid-1980s.The existing system consisted of three different tasks which are: o Call Taking: when a call was received, the receptionist had to take details and put the form on a conveyor belt while trying t locate the caller on the map o Resource identification: the

conveyor belt was used to send the completion of the project successfully. forms to the ambulance service employee who then checked the Last, but not the least, we would like to thank our location and assigned the call to an family and friends who provided us with valuable available unit. suggestions to improve our project. Resource Mobilization: the ambulance operators were then REFERENCES contacted and given the details of where there was an emergency so [1] http://www.cs.nott.ac.uk/~cah/G53QAT/Report08/UX A06U/uxa06u.html they could go.
http://www.hpl.hp.com/hpjournal/96aug/aug96a2.pd http://epublications.uef.fi/pub/urn_isbn_978-951-270109-4/urn_isbn_978-951-27-0109 [4] http://www.computer.org/csdl/trans/ts/2009/04/tts200 9040484-abs.html
[2] [3]

V. CONCLUSIONS

From the list of software system failures discussed above, we realise that before any software system can be deployed into real time situations, they need to be tested using efficient quality assurance and testing techniques. Also major problems start in the programming code, so developers have to pay attention to that aspect of the design. The other factors encountered are from the list above. Software systems are designed everyday in every part of the world and so software failures cant be avoided but with the right testing techniques they could be reduced.
ACKNOWLEDGMENT

This project is the result of the encouragement of many people who helped in shaping it and provide feedback, direction valuable support. It is with hearty gratitude that we acknowledge their contributions to our project. We would like to thank our internal guide Dr.G.N.Srinivasan professor, Department of Information Science & Engineering, RVCE for the guidance, suggestions in the area of improvement, and implementation of the project. We are also grateful to the HOD, Dr. Ramakanth Kumar P, Department of Information Science & Engineering, RVCE, for permitting us to take up this project and his encouragement. We thank our Principal, RVCE, who has always been a great source of inspiration. We thank RSST, for the infrastructure and facilities provided that helped in

You might also like