This blog is not updated. Please see our current Hexawise blog on software testing to keep up with our new posts.
Combinatorial Software Test Design – Beyond Pairwise Testing
I put this together to explain combinatorial software test design methods in an accessible manner. I hope you enjoy it and that, if you do, that you’ll consider trying to create test cases for your next testing project (whether you choose our Hexawise test case generator or some other test design tool).
Where I’m Coming From
As those of you know who read my posts, read my articles, and/or have attended my testing conference presentations, I am a passionate proponent of these approaches to software test design that maximize variation from test case to test case and minimize repetition. It’s not much of an exaggeration to say I hardly write or talk publicly about any other software testing-related topics. My own consistent experiences and formal studies indicate that pairwise, orthogonal array-based, and combinatorial test design approaches often lead to a doubling of tester productivity (as measured in defects found per tester hour) as compared to the far more prevalent practice in the software testing industry of selecting and documenting test cases by hand. How is it possible that this approach generates such a dramatic increase in productivity? What is so different between the manually-selected test cases and the pair-wise or combinatorial testing cases? Why isn’t this test design technique far more broadly adopted than it is?
A Common Challenge to Understanding: Complicated, Wonky Explanations
My suspicion is that a significant reason that combinatorial software testing methods are not much more widely adopted is that many of the articles describing it are simply too complex and/or too abstract for many testers to understand and apply. Such articles say things like:
A. Mathematical Model
A pairwise test suite is a t-way interaction test suite where t = 2. A t-way interaction test suite is a mathematical structure, called a covering array.
Definition 1 A covering array, CA(N; t, k, |v|), is an N × k array from a set, v, of values (symbols) such that every N × t subarray contains all tuples of size t (t-tuples) from the |v| values at least once .
The strength of a covering array is t, which defines, for example, 2-way (pairwise) or 3-way interaction test suite. The k columns of this array are called factors, where each factor has |v| values. In general, most software systems do not have the same number of values for each factor. A more general structure can be defined that allows variability of |v|.
Definition 2 A mixed level covering array, MCA (N; t, k, (|v1|,|v2|,…, |vk|)), is an N × k array on |v| values, where
| v |␣ ␣k | vi | , with the following properties: (1) Each i␣1
column i (1 ␣ i ␣ k) contains only elements from a set Si of size |vi|. (2) The rows of each N × t subarray cover all t-tuples of values from the t columns at least once.
– “Construct Pairwise Test Suites Based on the Bak-Sneppen Model of Biological Evolution” World Academy of Science, Engineering and Technology 59 2009 – Jianjun Yuan, Changjun Jiang
If you’re a typical software tester, even one motivated to try new methods to improve your skills, you could be forgiven for not mustering up the enthusiasm to read such articles. The relevancy, the power, and the applicability of combinatorial testing – not to mention that this test design method can often double your software testing efficiency and increase the thoroughness of your software testing – all tend to get lost in the abstract, academic, wonky explanations that are typically used to describe combinatorial testing. Unfortunately for pragmatic, action-oriented software testing practitioners, many of the readily accessible articles on pairwise testing and combinatorial testing tend to be on the wonky end of the spectrum; an exception to that general rule are the good, practitioner-oriented introductory articles available at combinatorialtesting.com.
A Different Approach to Explaining Combinatorial Testing and Pairwise Testing
In the photograph-rich, numbers-light, presentation embedded above, I’ve tried to explain what combinatorial testing is all about without the wonky-ness. The benefits from structured variation and from using combinatorial test design is, in my view, wildly under-appreciated. It has the following extremely important benefits:
- Less repetition from test case to test case
- In the context of discussing testing’s “pesticide paradox” James Bach, I believe, used the analogy that following in someone’s footsteps is a very good way to survive traversing through a mine field but a generally lousy way to find software defects efficiently.
- Maximizing variation from test case to test case, as a general rule, is an absolutely spectacular way to find defects quickly.
- There are thousands, if not trillions of relevant combinations to select from when identifying test cases to execute; computer algorithms will be able to solve the problem of “how can maximum variation be achieved?” far better than human brains can.
- More coverage of combinations of test inputs
- Most of the time, since awareness of pairwise and combinatorial testing methods remain low in the software testing community, combining all possible pairs of values in at least one test case is not even a conscious goal of testers.
- Even if this were a goal of their test design strategy, testers would have a tremendous challenge in trying to achieve such a goal: with hundreds, thousands or tens of thousands of targeted combinations to cover, losing track of a significant number of them and/or forgetting to include them in software tests is virtually a foregone conclusion unless a test case generator is used.
- More thorough coverage leads to more defects being found.
- Efficiency (Testers can “turn the coverage dial” to achieve maximum efficiency with a minimal number of tests)
- The efficiency and effectiveness benefits of pairwise testing have been demonstrated in testing projects every major industry.
- I wanted to prominently include the message that testers using test case generators have the option to dramatically increase the testing thoroughness levels of the tests they generate because it is a topic that often gets ignored in introductions to pairwise testing case studies and introductions
- Thoroughness – (Testers can also “turn the coverage dial” to achieve maximum thoroughness if that is their goal)
- Too often, tester’s view pairwise as a technique that focuses on a very small number of curiously strong tests; that is only part of the story.
- This can lead to the /false/ impression that combinatorial testing methods are inappropriate where high levels of testing thoroughness are required.
- You can create very different sets of tests that are as thorough as possible (given your understanding of what you are testing) no matter whether you have 1 hour to execute tests or one month to test.
Other Recommended Sources of Information on Pairwise and Combinatorial Testing:
- Combinatorial Software Testing (Contains results of a 10-project empirical study)
- Efficient and Effective Software Test Design (Contains some screen shots and worked examples to help make these concepts more concrete)
- Pairwise Testing: a Best Practice that Isn’t (Which has many good cautionary points lest any readers be tempted to embrace these test design methods as a silver bullet cure-all)
- Hexawise (Our test design tool, which includes many explanatory examples and templates of pairwise and combinatorial testing)
Questions or Comments?
If you have questions or comments, please leave a note below. I’d love to hear about people’s experiences using these test design approaches. Thank you.
A friend passed me this set of recent tweets from Wil Shipley, a Mac developer with 11,743 followers on Twitter as of today. Wil recently encountered the familiar problem of what to do when you’ve got more software tests to run than you can realistically execute.
I love that. Who can’t relate?
Now if only there were a good, quick way to reduce the number of tests from over a billion to a smaller, much more manageable set of tests that were “Altoid-like” in their curious strength. 🙂 I rarely use this blog for shameless plugs of our test case generating tool, but I can’t help myself here. The opening is just too inviting. So here goes:
There’s an app for that… See www.hexawise.com for Hexawise, a “pairwise software test case generating tool on steroids.” It eats problems like the one you encountered for breakfast. Hexawise winnows bazillions of possible test cases down in the blink of an eye to small, manageable sets of test cases that are carefully constructed to maximize coverage in the smallest amount of tests, with flexibility to adjust the solutions based upon the execution time you have available. In addition to generating pairwise testing solutions, Hexawise also generates more thorough applied statistics-based “combinatorial software testing” solutions that include tests for, say, all possible 6-way combinations of test inputs.
Where your Mac cops an attitude and tells you “Bitch, I ain’t even allocating 1 billion integers to hold your results” and showers you with taunting derisive sneers, head-waggling and snaps all carefully choreographed to let you know where you stand, Hexawise, in contrast, would helpfully tell you: “Only 1 billion total possibilities to select tests from? Pfft! Child’s play. Want to start testing the 100 or so most powerful tests? Want to execute an extremely thorough set of 10,000 tests? Want to select a thoroughness setting in the middle? Your wish is my command, sir. You tell me approximately how many tests you want to run and the test inputs you want to include, and I’ll calculate the most powerful set of tests you can execute (based on proven applied statistics-based Design of Experiments methods) before you can say “I’m Wil Shipley and I like my TED Conference swag.”
– Justin Hunter
There are good reasons James Bach is so well known among the testing community and constantly invited to give keynote presentations around the globe at software testing conferences. He’s passionate about testing and educating testers; he’s a gifted, energetic, and entertaining speaker with a great sense of humor; and he takes joy in rattling his saber and attacking well-established institutions and schools of thought that he disagrees with. He doesn’t take kindly to people who make inflated claims of benefits that would materialize “if only you’d perform testing in XYZ way or with ABC tool” given that (a) he can always seem to find exceptions to such claims, (b) he doesn’t shy away from confrontation, and (c) he (rightly, in my view) thinks that such benefits statements tend to discount the importance of critical thinking skills being used by testers and other important context-specific considerations.
Leave it up to James to create a list of 13 questions that would be great to ask the next software testing tool vendor who shows up to pitch his problem-solving product. In his blog post titled “The Essence of Heuristics,” he posed this exact set of questions in a slightly different context, but as a software testing tool vendor myself, they really hit home. They are:
1. Do they teach you how to tell if it’s working?
2. Do they teach you how to tell if it’s going wrong?
3. Do they teach you heuristics for stopping?
4. Do they teach you heuristics for knowing when to apply it?
5. Do they compare it to alternative heuristics?
6. Do they show you why it works?
7. Do they help you understand when it probably works best?
8. Do they help you know how to re-design it, if needed?
9. Do they let you own it?
10. Do they ask you to practice it?
11. Do they tell stories about how it has failed?
12. Do they listen to you when you question or challenge it?
13. Do they praise you for questioning and challenging it?
[Side note: Apparently I wasn’t the only one who thought of Hexawise and pairwise / combinatorial test design approaches when they saw these 13 questions. I was amused that after I drafted this post, I saw Jared Quinert’s / @xflibble’s tweet just now:]
Where do I come down on each of James’ 13 questions with respect to people I talk to about our test design tool, Hexawise, and the types of benefits and the size of benefits it typically delivers? Quite simply, “Yes” to all 13. I enjoy talking about exactly the kinds of questions that James raised in his list. In fact, when I sought out James to ask him questions at a conference in Boston earlier this year, it was because I wanted his perspective on many of the points above, particularly #11: (hearing stories about how James has seen pairwise and combinatorial approaches to test design fail), and #7 (hearing his views on where it works best and where it would be difficult to apply it). I’ll save my specific answers to another post, but I am serious about wanting to share my thoughts on them; time constraints are holding me back today. I gave a speech at the ASQ World Conference on Quality Improvement in St. Louis last week though that addressed many, but not all, of James’ questions.
I’m not your typical software tool vendor. Basically, my natural instincts are all wrong for sales. I agree with the premise that “a fool with a tool is still a fool”; when talking to target clients and/or potential partners, I’m inclined to point out deficiencies, limitations, and various things that could go wrong; I’m more of an introvert than an extrovert, etc. Not exactly the typical characteristics of a successful salesman… Having said that, I believe that we’ve built a very good tool that helps enable dramatic efficiency and thoroughness benefits in many testing situations but our tool, along with the pairwise and combinatorial test design approaches that Hexawise enables both have their limitations. It is primarily by talking to software testers about their positive and negative experiences that our company is able to improve our tool, enhance our training, and provide honest, pragmatic guidance to users about where and how to use our tool (and where and how not to).
Tool vendors who defend their tools (and/or the approaches by which their tools helps users solve problems) as magical, silver bullet solutions are being both foolish and dishonest. Tool vendors who choose not to engage in serious, honest and open discussions with users about the challenges that users have when applying their tools in different situations are being short-sighted. From my own experiences, I can say that talking about the 13 topics raised by James have been invaluable.
He includes the statement below which I agree with. If you are a software tester and any doubts about whether all of these methods work (pairwise software testing in particular), I would encourage you to conduct a pilot project on your own and measure the results achieved with and without the technique applied.
From the scientific side, testing can include a number of proven techniques such as equivalency class testing, boundary value analysis, pair-wise testing, etc. These techniques, if used properly, can reduce test times and focus on finding the bugs where they tend to hang out – much like a porch light on a summer night.
My response to Dave’s post, included below, is not especially profound or even well-written, but, hey, I’m in a hurry in the pre-Thanksgiving rush and the topic hit close to home so I couldn’t resist jotting a little something. Enjoy. Please let me know your thoughts / reactions if you have any.
Very well said!
I wholeheartedly, enthusiastically agree with your premise. I also wish that more people saw things the same way.
My father co-wrote Statistics for Experimenters which describes the “art and science” within the Design of Experiments (“DoE”) field of applied statistics. Well-run manufacturing companies use DoE techniques in their manufacturing processes. Many companies, such as Toyota see them as an absolutely fundamental part of their processes (yet unfortunately, software testers, who could use DoE techniques such as pairwise and other forms of combinatorial testing, are often ignorant about how to use them properly and the software testing industry as a whole dramatically under-utilizes such techniques…. but I digress).
I brought the book up because it opens up with a good example relevant to the points you made. To win at the game of 20 questions, it is useful to know “the science” of game theory and DoE; choose questions so that there is a 50/50 chance that the answer will be Yes. Someone who knows this technique, all else being equal, will be win more because of their “scientific” approach than someone who doesn’t know the technique. And yet… other stuff (whether the subject matter expertise in this example, or subject matter expertise and “artistic” Exploratory Testing in your example) is indispensable as well.
You can’t truly excel at either 20 Questions or software testing unless you have a good mix of “science” (governed by mathematical principles, proven methods of DoE, etc.) and and “art” (governed by experience, instincts, and subject matter expertise).
There are some phrases in English that, as often as not, come off sounding obligatory and/or insincere. The phrase “I’m honored…” comes to mind (particularly if someone is accepting an award in front of a room full of people).
Be that as it may, I genuinely felt really honored last night and again today by a couple comments James Bach has said about me, including these:
Here’s the quick background: (1) James knows much more about software testing than I do and I respect his views a lot. (2) He has a reputation for not suffering fools gladly and pretty bluntly telling people he doesn’t respect them if he doesn’t respect the content of their views. (3) in addition to his extremely broad expertise on “testing in general” James, like Michael Bolton, knows a lot about pairwise and combinatorial testing methods and how to use them. (4) I firmly (and passionately) believe that pairwise and combinatorial testing methods are (a) dramatically under-appreciated, and (b) dramatically under-utilized. (5) James has published a very good and well-reasoned article about some of the limitations of pairwise testing methods that I wanted to talk to him about. (6) I co-wrote an article that IEEE Computer recently published about Combinatorial Testing that I wanted to discuss with him. (7) James and I have been at the STP Conference in Boston over the past few days. (8) I reached out to him and asked to meet at the conference to talk about pairwise and combinatorial testing methods and share with him my findings that – in the dozens of projects I’ve been involved with that have compared testers efficiency and effectiveness – I’ve routinely seen defects found per tester hour more than double. (9) I was interested in getting his insights into where are these methods most applicable? Least applicable? What have his experiences been in teaching combinatorial testing methods to students, etc.
In short, frankly, my goals in meeting with him were to: (a) meet someone new, interesting and knowledgeable and learn as much as could and try to understand from his experiences, his impressive critical thinking and his questioning nature, and (b) avoid tripping up with sloppy reasoning (when unapologetically expressing the reasons I feel combinatorial testing methods are dramatically under-appreciated by the software testing community) in front of someone who (i) can smell BS a mile away, and (ii) doesn’t suffer fools gladly.
I learned a lot, heard some fantastic war stories and heard his excellent counter-examples that disproved a couple of the generalizations I was making (but didn’t dampen my unshaken assertions that combinatorial testing methods are wildly under-utilized by the software testing community). I thoroughly enjoyed the experience. Moving forward, as a result of our meeting, I will go through an exercise which will make me more effective (namely carefully thinking through and enumerating all of the assumptions behind my statements like: “I’ve measured the effectiveness of testers dozens of times – trying to control external variables as much as reasonably possible – and I’m consistently seeing more than twice as many defects per tester hour when testers adopt pairwise/combinatorial testing methods.”
His complement last night was private so I won’t share it but it ranks up there in my all time favorite complements I’ve ever received. I’m honored. Thanks James.
Matthew Heusser, an accomplished tester, frequent blogger, and insightful contributor in the Context Driven Testing mailing list, and a testing expert whose opinion I respect a lot, has just published a very thought-provoking blog post that highlights an important issue surrounding “PowerPointy” consultants in the testing industry who have relatively weak real world testing chops. It’s called “The Fishing Maturity Model.”
Matthew argues that testers are well-advised to be skeptical of self-described testing experts who claim to “have the answer” – particularly when such “experts” haven’t actually rolled their sleeves up and done software testing themselves. In reading his article, I found it quite thought-provoking, particularly because it hit close to home: while I’m by no means a testing expert in the broader sense of the term, I do consider myself to know enough about combinatorial test design strategies applicable to software testing to be able to help most testing teams become demonstrably more efficient and effective… and yet, my actual hands-on testing experiences are admittedly quite limited. If I’m not one of the guys he’s (justifiably) skewering with his funny and well-reasoned post (and he assures me I’m not; see below), a tester could certainly be forgiven for mistaking me for one based on my past experiences.
Matthew’s Five Levels of the Fishing Maturity Model (based, not so loosely, of course on the Testing Maturity Model, not to mention CMM, and CMMi)…
The five levels of the fishing maturity model:
1 – Ad-hoc. Fishing is an improvised process.
2 – Planned. The location and timing of our ships is planned. With a knowledge of how we did for the past two weeks, knowing we will go to the same places, we can predict our shrimp intake.
3 – Managed. If we can take the shrimp fishing process and create standard processes – how fast to drive the boat, and how deep to let out the nets, how quickly, etc, we can improve our estimates over time, more importantly.
4 – Measured. We track our results over time – to know exactly how many pounds of shrimp are delivered at what time with what processes.
5 – Optimizing. At level 5, we experiment with different techniques; to see what gathers more shrimp and what does not. This leads us to continual improvement.
Sounds good, right? Why, with a little work, this would make a decent 1-hour conference presentation. We could write a little book, create a certification, start running conferences …
And the rub…
I’ve never fished with nets in my entire life. In fact, the last time I fished with a pole, I was ten years old at Webelo’s camp.
I posted the following response, based on my personal experiences: Words in [brackets] are Matthew’s response to me.
Excellent post, as usual. [I’m glad you like it. Thank you.]
You raise very good points. Testers (and other IT executives) should be leery of snake oil salesmen and use their judgment about “experts” who lack practical hands-on experience. While I completely agree with this point, I offer up my own experiences as a “counter-example” to the problem you pointed out here.
3-4 years ago, while I was working at a management consulting and IT company, (with a personal background as an entrepreneur, lawyer, and management consultant – and not in software testing), I began to recommend to any software testers who would listen, that they start using a different approach to how they designed their test cases. Specifically, I was recommending that testers should begin using applied statisitics-based methods* designed to maximize coverage in a small number of tests rather than continuing to manually select test cases and rely on SME’s to identify combinations of values that should be tested together. You could say, I was recommending that they adopt what I consider to be (in many contexts) a “more mature” test design process.
The reaction I got from many teams was, as you say “this whole thing smells fishy to me” (or some more polite version of the rebuttal “Why in the world should I, with my years of experience in software testing, listen to you – a non-software tester?”) Here’s the thing: when teams did use the applied statistics-based testing methods I recommended, they consistently saw large time reductions in how long it took them to identify and document tests (often 30-40%) and they often saw huge leaps in productivity (e.g., often finding more than twice as many defects per tester hour). In each proof of concept pilot, we measured these carefully by having two separate teams – one using “business as usual” methods, the other using pairwise or orthogonal array-based test design strategies – test the same application. Those dramatic results led to my decision to create Hexawise, a software test design tool. [Point Taken …]
My closing thoughts related to your post boils down to:
1) I agree with your comment – “There are a lot of bogus ideas in software development.”
2) I agree that testers shouldn’t accept fancy PowerPointed ideas like “this new, improved method/model/tool will solve all your problems.”
3) I agree that testers should be especially skeptical when the person presenting those PowerPointed slides hasn’t rolled up their sleeves for years as a software testing practitioner.
4) Some consultants who lack software testing experience actually are capable of making valuable recommendations to software testers about how they can improve their efficiency and effectiveness. It would be a mistake to write them off as charlatans because of their lack of software testing experience. [I agree with the sentiment that sometimes, people out of the field can provide insight. I even hinted at that with the comment that at least, Forrest should listen, then use his discernment on what to use. I’m not entirely ready to, as the expression goes, throw the baby out with the bathwater.]
5) Following the “bogus ideas” link above takes readers to your quote that: “When someone tells you that your organization has to do something ‘to stay competitive,’ but he or she can’t provide any direct link to sales, revenue, reduced expenses, or some other kind of money, be leery.” I enthusiastically agree. In the software testing community, in my view, we do not focus enough on gathering real data** about which approaches work (or -ideally- in what contexts they work). A more data-driven management approach would help everyone understand what methods and approaches deliver real, tangible benefits in a wide variety of contexts vs. those methods and approaches that look good on paper but fall short in real-world implementations. [Hey man, you can back up your statements with evidence, and you’re not afraid to roll up your sleeves and enter an argument. I may not always agree with you, but you’re exactly the kind of person I want to surround myself with, to keep each other sharp. Thank you for the thoughtful and well reasoned comment.]
*I use the term “applied statistics-based testing” to incorporate pairwise, orthogonal array-based, and more comprehensive combinatorial test design methods such as n-wise testing (that can capture, for example, all possible valid combinations involving 6-values).
**Here is an article I co-wrote which provides some solid data that applied statitics-based testing methods can more than double the number of defects found per tester hour (and simultaneously result in finding more defects) as compared to testing that relies on “business as usual” methods during the test case identification phase.