Friday, March 21, 2014

Are unit tests waste?

[Edit]
If you're interested in how to get real confidence out of your investment in testing, see Risk, acceptance tests, why unit tests are not enough, and what can you do about it

A co-worker sent me this article from James O Coplien a few days ago: http://www.rbcs-us.com/documents/Why-Most-Unit-Testing-is-Waste.pdf.
If you get past the title (which you should), and don’t take everything it says as “the truth”, there are a lot of interesting things we could take from it. This is my personal resume from the article but I encourage everyone to read it.
You can apply many of these personally, other would require team agreement and others would even require changes to wider policies:
People confuse automated tests with unit tests: so much so that when I criticise unit testing, people rebuke me for critising automation.
Keep this in mind whilst reading the article (or this resume). This is not about removing automated tests, is about the value of testing one method in isolation (unit) versus testing at the system or sub-system level.

(...) don’t forget the Product Owner perspective in Scrum or the business analyst or Program Manager: risk management is squarely in the center of their job, which may be why Jeff Sutherland says that the PO should conceive (and at best design) the system tests as an input to, or during, Sprint Planning (...) Software engineering research has shown that the most cost-effective places to remove bugs are during the transition from analysis and design, in design itself, and in the disciplines of coding. It’s much easier to avoid putting bugs in that to take them out (…) one of my favourite cynical quotes is, “I find that weeks of coding and testing can save me hours of planning.” (...) “There’s something really sloppy about this ‘fail fast’ culture in that it encourages throwing a bunch of pasta at the wall without thinking much… in part due to an over-confidence in the level of risk mitigation that unit tests are achieving”.
This is said in many other places but you can never say it enough. Translated into process: take your time in the design meetings, don’t rush it and think it through! Do your best to get it right the first time. Don’t leave it to “the QA process” to find your errors later.
(...) you can model any program as a Turing tape, and what the program can do is somehow related to the number of bits on that tape at the start of execution. If you want to thoroughly test that program, you need a test with at least the same amount of information: i.e., another Turing tape of at least the same number of bits (…) to do complete testing, the number of lines of code in unit tests would have to be orders of magnitude larger than those in the unit under test (...) Few developers admit that they do only random or partial testing and many will tell you that they do complete testing for some assumed vision of complete. Such visions include notions such as: "Every line of code has been reached," which, from the perspective of theory of computation, is pure nonsense in terms of knowing whether the code does what it should. Unit tests are unlikely to test more than one trillionth of the functionality of any given method in a reasonable testing cycle. Get over it. (Trillion is not used rhetorically here, but is based on the different possible states given that the average object size is four words, and the conservative estimate that you are using 16-bit words).
These are some of the reasons why we absolutely need automated Acceptance Tests. It also links to the fact that we shouldn’t care about “impossible” scenarios: if something can’t happen in production then we don’t need to test it, which in turn links back to minimizing configuration options because each option increases the testing effort exponentially.
The purpose of testing is to create information about your program. (Testing does not increase quality; programming and design do. Testing just provides the insights that the team lacked to do a correct design and implementation.)
Never, ever, lose sight of this.
The third tests to throw away are the tautological ones. I see more of these than you can imagine — particularly in shops following what they call test-driven development (...) However, as with most unit tests, it’s better to make this an assertion than to pepper your test framework with such checks (...) When I look at most unit tests (...) they are assertions in disguise. When I write a great piece of software I sprinkle it with assertions that describe promises that I expect the callers of my functions to live up to, as well as promises that function makes to its clients. Those assertions evolve in the same artefact as the rest of my code (...) Turn unit tests into assertions. Use them to feed your fault-tolerance architecture on high-availability systems. This solves the problem of maintaining a lot of extra software modules that assess execution and check for correct behavior; that’s one half of a unit test. The other half is the driver that executes the code: count on your stress tests, integration tests, and system tests to do that.
This is what Code Contracts should be doing for us. Unfortunately Code Contracts seems to be taking too long to become a finished product but if you choose to get rid of it then you should at least replace the contracts with Debug.Assert.
(...) one question to ask about every test is: If this test fails, what business requirement is compromised? Most of the time, the answer is, "I don't know." If you don't know the value of the test, then the test theoretically could have zero business value. The test does have a cost: maintenance, computing time, administration, and so forth. That means the test could have net negative value.
Make sure you link your automated acceptance tests back to the original acceptance test which in turn should be linked to a requirement (User Story, Use Case, etc). If you're using some BDD framework then you might have the option to consider the automated tests as your actual acceptance tests.
If you cannot tell how a unit test failure contributes to product risk, you should evaluate whether to throw the test away. There are better techniques to attack quality lapses in the absence of formal correctness criteria, such as exploratory testing and Monte Carlo techniques. (Those are great and I view them as being in a category separate from what I am addressing here.) Don’t use unit tests for such validation.
Again, Acceptance vs Unit Tests. They aren’t the same and Acceptance Tests that link directly to requirements provide much more value.
Most programmers believe that source line coverage, or at least branch coverage, is enough. No. From the perspective of computing theory, worst-case coverage means investigating every possible combination of machine language sequences, ensuring that each instruction is reached, and proving that you have reproduced every possible configuration of bits of data in the program at every value of the program counter. (It is insufficient to reproduce the state space for just the module or class containing the function or method under test: generally, any change anywhere can show up anywhere else in a program and requires that the entire program can be retested. Long fragment but very interesting. This is why we need to minimize the number of interactions in a system.
Even if “in general” his explanation seems correct, a good design can (and should) mitigate this by applying encapsulation, SRP, cohesion, preferring immutable state when possible, etc. Make each module in the system be a black-box API to other modules, and define the interaction with well-defined contracts.
(...) The classes he was testing are code. The tests are code. Developers write code. When developers write code they insert about three system-affecting bugs per thousand lines of code. If we randomly seed my client’s code base — which includes the tests — with such bugs, we find that the tests will hold the code to an incorrect result more often than a genuine bug will cause the code to fail! Interesting point.
The numbers are clear but I believe he’s not considering that tests should be much simpler than “production code”, so the bug ratio must be much lower (the number of bugs is not a linear function of the complexity but exponential or similar).
Create system tests with good feature coverage (not code coverage) — remembering that proper response to bad inputs or other unanticipated conditions is part of your feature set.
No comments. Just do it.
In summary (from the article itself, not my words. Bold means I consider it controversial, the rest I consider we should be doing it):
  1. Keep regression tests around for up to a year — but most of those will be system-level tests rather than unit tests.
  2. Keep unit tests that test key algorithms for which there is a broad, formal, independent oracle of correctness, and for which there is ascribable business value.
  3. Except for the preceding case, if X has business value and you can test X with either a system test or a unit test, use a system test — context is everything. I understand he’s using “unit tests” referring to the scope (1 method). Speed is a different concern here.
  4. Design a test with more care than you design the code.
  5. Turn most unit tests into assertions.
  6. Throw away tests that haven’t failed in a year.
  7. Testing can’t replace good development: a high test failure rate suggests you should shorten development intervals, perhaps radically, and make sure your architecture and design regimens have teeth.
  8. If you find that individual functions being tested are trivial, double-check the way you incentivize developers’ performance. Rewarding coverage or other meaningless metrics can lead to rapid architecture decay.
  9. Be humble about what tests can achieve. Tests don’t improve quality: developers do.

2 comments:

Anonymous said...

Unit-tests are not an acceptable substitute for any other kind of test. Unit-test guide design at the coding level. Unit-tests make refactoring an acceptable risk. Refactoring production code without unit-tests has an unacceptable chance of introducing bugs. Refactoring doubles or quadruples the practical lifetime of a piece of software, and this is the value of unit-tests (IMO).

RG said...

Hi.

Thanks for your comment.

If we agree that:

- unit tests test only one method
- refactoring is to change implementation without affecting functionality

then by definition unit tests in many cases don't support refactoring. Once you modify a method you'll have to modify its tests, and no unit tests will be testing the interaction of that method with others. If you decide to remove a method then you'll have to remove its unit tests, and again nothing will be testing that interaction (other than the compiler if you're in a compiled language).

A higher level test (system, end-to-end, etc.) which represents an actual requirement (acceptance criteria) will support refactoring by guaranteeing that acceptance criteria is still satisfied once you have modified the implementation details that were satisfying those criteria in the first place.

I believe that's what the article is trying to highlight, and in any case that's what I'm taking from it. We need to pay more attention to higher level testing and stop trusting unit tests as a way to verify our programs. TDD is a design tool, not a verification tool. At the most, unit tests can only guarantee small components work correctly but in any case should we work on the assumption that by having a high unit test coverage we're guaranteeing quality of the application.