Nobody likes failure and its negative overtones make it unpopular as a topic, but the agile concept of ‘fail fast’ has a subtlety lost on anyone not familiar with agile, lean, and DevOps thinking in software development.
The general press and media (Financial Times, Forbes, New Yorker etc) in particular likes to see it as a sign of madness that emanates from Silicon Valley and likes to associate it with ‘fail often’.
Anyone familiar with agile/lean/DevOps will not recognise ‘fail often’ at all, and this concept does sound like madness. The funny thing is that there is a university course (and book) run by two psychologists called ‘Fail Fast, Fail Often’ run straight out of Silicon Valley at Stanford University.
This unfortunately only muddies the picture. In the world of software development ‘fail often’ is hardly an issue as software failure is a fact of life and, I can assure everyone, is not desirable at all.
However, failing fast is highly desirable and this distinction appears to be lost in the general media. ‘Fail fast’ is all about the second word, it is about reducing delay.
In software development, the point about ‘fail fast’ is that if a failure is going to take place you want to reduce the time lag in a) detecting the failure, and b) relaying the detection back to the responsible developer. Let us deal with these points in turn.
> See also: What DevOps can do for your business
The cost of repairing software with a defect tends to rise exponentially the longer the defect remains undetected. The worst case is for the failure to be discovered by the customer as this entails creating a patch or a new release and has a damaging effect on the quality reputation of the vendor/supplier.
This worst case scenario discovery of the defect when the product is already shipped to the customer is also the longest time between the defect being introduced and the defect being discovered.
It runs the risk that other software has been written to be dependent on the defect being in place and that fixing the defect could break other software (for example, again worst case scenario, the defect could be a deep architectural issue).
So the lag in discovery carries additional risk. If discovery is very late indeed, say years after the software is delivered, it is possible that the original developers are no longer with the supplier and so there is additional risk in actually finding the right people to fix the defect.
The best case scenario is for the defect to be detected by the developer who introduced it and who discovers it while running a unit test at the time the code was written. The rapidity in detection means the fallout due to the defect is minimised.
The second point deals with the speed of relaying the defect discovery back to the developer. In the worst case scenario, where the defect is discovered by the customer, there is now the question of whether the customer informs the supplier. If the customer fails to inform the supplier and the supplier remains ignorant of its existence then the discovery is wasted and the defect remains in place.
> See also: How DevOps is demanding a whole new approach to cloud
The best case scenario is where the developer is informed within minutes or a few hours of the defect being introduced, for example through running unit tests or continuous integration and testing using regression test suites. The reduced lag in informing the developer means the defect can be easily fixed with the code still fresh in the mind of the developer.
These two points combined explain why failing fast is so helpful in software development. As the above examples illustrate, running unit tests is a great agile practice that helps achieve ‘failing fast’.
So are the frequent releases to the end user at the end of each agile iteration, as it allows defects to be discovered quicker than if the product is shipped at the end of a long waterfall lifecycle.
Discovering issues at the end of agile iterations is also part of the fail fast concept, as it steers the project to success during development, rather than creating a lot of software before showing it to the end user. Fail fast is a useful and important agile concept.