Friday, May 9, 2014

Risk, acceptance tests, why unit tests are not enough, and what can you do about it

It's time to release again (hopefully after only 1-2 weeks after your previous release) and you must answer the question: how confident are you that the software is going to do what it's supposed to do?

It's ultimately the Product Owner's decision if you should release or not, but in order to make the decision she must answer that very same question. Based on what? What information can we give her to help her support her decision?

Most teams go about it like this:
Hey, all unit tests and integration tests are passing and we have a 98% test coverage! We've manually checked that the software satisfies the acceptance criteria for the new stories we played, and we also did some manual regression testing. Everything seems to be OK. We're happy to release.
For a team to state they're happy to release when this is all the information they can provide is an act of faith in their skills. They're assuming that because unit tests are passing then the software is going to work as expected.

Unfortunately we've all been trained to fully believe in our unit tests and how good we are: surely if we test the individual methods and then the interactions by faking dependencies, we can imply that everything works, right?

Well, I'd argue there are a lot of assumptions and untested implications in the proposition "the system works because we have good unit tests coverage and those tests are passing". See my previous post about the value of unit tests, and by all means, do your Googling... the subject's been heating up recently.

Long story short, you may have all the unit tests in the world, 100% code coverage (whatever that means), and still have no certainty that the software is going to work as the developers intented. More importantly, because unit tests verify that software does what the developers wanted, you'll have even less certainty that the software is going to do what the Product Owner wanted!

Here's a good exercise you can do to get an idea of the implications you do in this process: take a couple of user stories you have already delivered and released, and try to make a map of the unit tests that guarantee that user story works as expected.

Take 1 minute and think about how would you go about it right now...

The first sign of alert should be that you had never thought about this, which is usually the case.

In most cases only imagining the exercise should be enough, but if you go ahead and are actually able to do it then you should end up with either the realization that you have blind spots (particularly around interactions) or in the best case, a big mess of tests scathered all around the code with no clear relation to a particular acceptance test.

If your Product Owner is not very technical then deciding to release after been given the former assessment on the quality of the release is an act of faith in the team. One most of us has been part of. If the Product Owner happens to be a techy then she's probably into the unit tests "religion" and is as convinced as everyone else that you can release with very high (and false) certainty things are to work as expected.

OK, you got me thinking. What can I do?


What if you could provide the following assessment on the quality:
Hey, all unit tests are passing, all the end-to-end automated acceptance tests which run in a production-like environment are passing. We've done some manual exploratory testing and didn't find any issues.
The automated acceptance tests cover 83% of all the acceptance tests, 98% of those considered critical and 95% of the new acceptance tests we've just implemented.
We're happy to release.
If you've followed your process correctly then acceptance tests represent all your Product Owner cares about. If you can tell the Product Owner 83% of the acceptance tests have been verified in a production-like environment you're basically telling her that at least 83% of the features are known to work (the only risk being how different from production the test environment is). And she can now make an informed decision of the type "no, I really need X and Y to be verified" or "great, let's go ahead".

But isn't the end-to-end automation of acceptance tests expensive?


Yes, it is. It will consume time you could spend doing something else, like implementing new features. But assuming your software is of a reasonable complexity it should very quickly be impossible to manually verify each acceptance test which is why I included risk in the title.

It's all about managing risk. If the Product Owner and the team are happy to live with the risk of regressions of bad behaviour then it makes sense to cut down on this automation, or apply it only to critical areas. In most cases this is not the case though, and the only reason releases are approved is because there's no clear picture of what the certainty is.

Here's another exercise (more of a challenge really): next time you have to release, create a list of all the acceptance tests in the stories you have already delivered and released. Then try to explain to your Product Owner how many of those have been validated for this release. Feel free to do all the magic and hand waving you want, just stay honest.

You can say "this acceptance test is covered because we have this and this and this unit tests, you see...", and "this other was tested manually and because it uses most of the same components as acceptance test X then we consider X has also been validated". This other might also work: "we haven't touched this component for a long time and because users haven't complained so far we can safely assume it's working properly".

See how much confidence can you get from your Product Owner for that release.

If your application is in any way critical to your company by this point you'll probably have a very pallid face in front of you. Clear exceptions might be some start-ups, experiments, prototypes, etc.

After perhaps a bit of hatred and arguing, and after pointing out that it's unfeasible to verify every single acceptance test before each release, this would be the time to propose that you invest in the automation of your acceptance tests, in a production-like environment.

Note that all this is very related to BDD, which has been around for a long time, but most people associate BDD with a framework and it also implies that you write your tests first, which is not the point I'm trying to make here.

Any hints on automating the acceptance tests?


Actually, yes, I have a couple of things we've been trying and I haven't seen anywhere else like having a unique set of tests that can be run fast, without any I/O, and also in a full-deployed environment.

Being able to run those tests fast give the developers a very quick feedback on their changes, much like unit tests do, especially if you use a tool like NCrunch.

The full-deployed version will be much slower so it should probably run as part of your continuos integration, but will give you much more confidence in return.

I'll try to post more details about this approach soon, but whatever you do, please start being objective about the quality of the software you're releasing. If you decide to take the step and start to automate your acceptance tests systematically then make sure you invest in quality: you'll need to maintain this code together with your production code..

Friday, March 21, 2014

Are unit tests waste?

[Edit]
If you're interested in how to get real confidence out of your investment in testing, see Risk, acceptance tests, why unit tests are not enough, and what can you do about it

A co-worker sent me this article from James O Coplien a few days ago: http://www.rbcs-us.com/documents/Why-Most-Unit-Testing-is-Waste.pdf.
If you get past the title (which you should), and don’t take everything it says as “the truth”, there are a lot of interesting things we could take from it. This is my personal resume from the article but I encourage everyone to read it.
You can apply many of these personally, other would require team agreement and others would even require changes to wider policies:
People confuse automated tests with unit tests: so much so that when I criticise unit testing, people rebuke me for critising automation.
Keep this in mind whilst reading the article (or this resume). This is not about removing automated tests, is about the value of testing one method in isolation (unit) versus testing at the system or sub-system level.

(...) don’t forget the Product Owner perspective in Scrum or the business analyst or Program Manager: risk management is squarely in the center of their job, which may be why Jeff Sutherland says that the PO should conceive (and at best design) the system tests as an input to, or during, Sprint Planning (...) Software engineering research has shown that the most cost-effective places to remove bugs are during the transition from analysis and design, in design itself, and in the disciplines of coding. It’s much easier to avoid putting bugs in that to take them out (…) one of my favourite cynical quotes is, “I find that weeks of coding and testing can save me hours of planning.” (...) “There’s something really sloppy about this ‘fail fast’ culture in that it encourages throwing a bunch of pasta at the wall without thinking much… in part due to an over-confidence in the level of risk mitigation that unit tests are achieving”.
This is said in many other places but you can never say it enough. Translated into process: take your time in the design meetings, don’t rush it and think it through! Do your best to get it right the first time. Don’t leave it to “the QA process” to find your errors later.
(...) you can model any program as a Turing tape, and what the program can do is somehow related to the number of bits on that tape at the start of execution. If you want to thoroughly test that program, you need a test with at least the same amount of information: i.e., another Turing tape of at least the same number of bits (…) to do complete testing, the number of lines of code in unit tests would have to be orders of magnitude larger than those in the unit under test (...) Few developers admit that they do only random or partial testing and many will tell you that they do complete testing for some assumed vision of complete. Such visions include notions such as: "Every line of code has been reached," which, from the perspective of theory of computation, is pure nonsense in terms of knowing whether the code does what it should. Unit tests are unlikely to test more than one trillionth of the functionality of any given method in a reasonable testing cycle. Get over it. (Trillion is not used rhetorically here, but is based on the different possible states given that the average object size is four words, and the conservative estimate that you are using 16-bit words).
These are some of the reasons why we absolutely need automated Acceptance Tests. It also links to the fact that we shouldn’t care about “impossible” scenarios: if something can’t happen in production then we don’t need to test it, which in turn links back to minimizing configuration options because each option increases the testing effort exponentially.
The purpose of testing is to create information about your program. (Testing does not increase quality; programming and design do. Testing just provides the insights that the team lacked to do a correct design and implementation.)
Never, ever, lose sight of this.
The third tests to throw away are the tautological ones. I see more of these than you can imagine — particularly in shops following what they call test-driven development (...) However, as with most unit tests, it’s better to make this an assertion than to pepper your test framework with such checks (...) When I look at most unit tests (...) they are assertions in disguise. When I write a great piece of software I sprinkle it with assertions that describe promises that I expect the callers of my functions to live up to, as well as promises that function makes to its clients. Those assertions evolve in the same artefact as the rest of my code (...) Turn unit tests into assertions. Use them to feed your fault-tolerance architecture on high-availability systems. This solves the problem of maintaining a lot of extra software modules that assess execution and check for correct behavior; that’s one half of a unit test. The other half is the driver that executes the code: count on your stress tests, integration tests, and system tests to do that.
This is what Code Contracts should be doing for us. Unfortunately Code Contracts seems to be taking too long to become a finished product but if you choose to get rid of it then you should at least replace the contracts with Debug.Assert.
(...) one question to ask about every test is: If this test fails, what business requirement is compromised? Most of the time, the answer is, "I don't know." If you don't know the value of the test, then the test theoretically could have zero business value. The test does have a cost: maintenance, computing time, administration, and so forth. That means the test could have net negative value.
Make sure you link your automated acceptance tests back to the original acceptance test which in turn should be linked to a requirement (User Story, Use Case, etc). If you're using some BDD framework then you might have the option to consider the automated tests as your actual acceptance tests.
If you cannot tell how a unit test failure contributes to product risk, you should evaluate whether to throw the test away. There are better techniques to attack quality lapses in the absence of formal correctness criteria, such as exploratory testing and Monte Carlo techniques. (Those are great and I view them as being in a category separate from what I am addressing here.) Don’t use unit tests for such validation.
Again, Acceptance vs Unit Tests. They aren’t the same and Acceptance Tests that link directly to requirements provide much more value.
Most programmers believe that source line coverage, or at least branch coverage, is enough. No. From the perspective of computing theory, worst-case coverage means investigating every possible combination of machine language sequences, ensuring that each instruction is reached, and proving that you have reproduced every possible configuration of bits of data in the program at every value of the program counter. (It is insufficient to reproduce the state space for just the module or class containing the function or method under test: generally, any change anywhere can show up anywhere else in a program and requires that the entire program can be retested. Long fragment but very interesting. This is why we need to minimize the number of interactions in a system.
Even if “in general” his explanation seems correct, a good design can (and should) mitigate this by applying encapsulation, SRP, cohesion, preferring immutable state when possible, etc. Make each module in the system be a black-box API to other modules, and define the interaction with well-defined contracts.
(...) The classes he was testing are code. The tests are code. Developers write code. When developers write code they insert about three system-affecting bugs per thousand lines of code. If we randomly seed my client’s code base — which includes the tests — with such bugs, we find that the tests will hold the code to an incorrect result more often than a genuine bug will cause the code to fail! Interesting point.
The numbers are clear but I believe he’s not considering that tests should be much simpler than “production code”, so the bug ratio must be much lower (the number of bugs is not a linear function of the complexity but exponential or similar).
Create system tests with good feature coverage (not code coverage) — remembering that proper response to bad inputs or other unanticipated conditions is part of your feature set.
No comments. Just do it.
In summary (from the article itself, not my words. Bold means I consider it controversial, the rest I consider we should be doing it):
  1. Keep regression tests around for up to a year — but most of those will be system-level tests rather than unit tests.
  2. Keep unit tests that test key algorithms for which there is a broad, formal, independent oracle of correctness, and for which there is ascribable business value.
  3. Except for the preceding case, if X has business value and you can test X with either a system test or a unit test, use a system test — context is everything. I understand he’s using “unit tests” referring to the scope (1 method). Speed is a different concern here.
  4. Design a test with more care than you design the code.
  5. Turn most unit tests into assertions.
  6. Throw away tests that haven’t failed in a year.
  7. Testing can’t replace good development: a high test failure rate suggests you should shorten development intervals, perhaps radically, and make sure your architecture and design regimens have teeth.
  8. If you find that individual functions being tested are trivial, double-check the way you incentivize developers’ performance. Rewarding coverage or other meaningless metrics can lead to rapid architecture decay.
  9. Be humble about what tests can achieve. Tests don’t improve quality: developers do.