13 Great Questions To Ask Software Testing Tool Vendors

There are good reasons James Bach is so well known among the testing community and constantly invited to give keynote presentations around the globe at software testing conferences. He’s passionate about testing and educating testers; he’s a gifted, energetic, and entertaining speaker with a great sense of humor; and he takes joy in rattling his saber and attacking well-established institutions and schools of thought that he disagrees with. He doesn’t take kindly to people who make inflated claims of benefits that would materialize “if only you’d perform testing in XYZ way or with ABC tool” given that (a) he can always seem to find exceptions to such claims, (b) he doesn’t shy away from confrontation, and (c) he (rightly, in my view) thinks that such benefits statements tend to discount the importance of critical thinking skills being used by testers and other important context-specific considerations.

Leave it up to James to create a list of 13 questions that would be great to ask the next software testing tool vendor who shows up to pitch his problem-solving product. In his blog post titled “The Essence of Heuristics,” he posed this exact set of questions in a slightly different context, but as a software testing tool vendor myself, they really hit home. They are:

1. Do they teach you how to tell if it’s working?
2. Do they teach you how to tell if it’s going wrong?
3. Do they teach you heuristics for stopping?
4. Do they teach you heuristics for knowing when to apply it?
5. Do they compare it to alternative heuristics?
6. Do they show you why it works?
7. Do they help you understand when it probably works best?
8. Do they help you know how to re-design it, if needed?
9. Do they let you own it?
10. Do they ask you to practice it?
11. Do they tell stories about how it has failed?
12. Do they listen to you when you question or challenge it?
13. Do they praise you for questioning and challenging it?

[Side note: Apparently I wasn’t the only one who thought of Hexawise and pairwise / combinatorial test design approaches when they saw these 13 questions. I was amused that after I drafted this post, I saw Jared Quinert’s / @xflibble’s tweet just now:]

Where do I come down on each of James’ 13 questions with respect to people I talk to about our test design tool, Hexawise, and the types of benefits and the size of benefits it typically delivers? Quite simply, “Yes” to all 13. I enjoy talking about exactly the kinds of questions that James raised in his list. In fact, when I sought out James to ask him questions at a conference in Boston earlier this year, it was because I wanted his perspective on many of the points above, particularly #11: (hearing stories about how James has seen pairwise and combinatorial approaches to test design fail), and #7 (hearing his views on where it works best and where it would be difficult to apply it). I’ll save my specific answers to another post, but I am serious about wanting to share my thoughts on them; time constraints are holding me back today. I gave a speech at the ASQ World Conference on Quality Improvement in St. Louis last week though that addressed many, but not all, of James’ questions.

I’m not your typical software tool vendor. Basically, my natural instincts are all wrong for sales. I agree with the premise that “a fool with a tool is still a fool”; when talking to target clients and/or potential partners, I’m inclined to point out deficiencies, limitations, and various things that could go wrong; I’m more of an introvert than an extrovert, etc. Not exactly the typical characteristics of a successful salesman… Having said that, I believe that we’ve built a very good tool that helps enable dramatic efficiency and thoroughness benefits in many testing situations but our tool, along with the pairwise and combinatorial test design approaches that Hexawise enables both have their limitations. It is primarily by talking to software testers about their positive and negative experiences that our company is able to improve our tool, enhance our training, and provide honest, pragmatic guidance to users about where and how to use our tool (and where and how not to).

Tool vendors who defend their tools (and/or the approaches by which their tools helps users solve problems) as magical, silver bullet solutions are being both foolish and dishonest. Tool vendors who choose not to engage in serious, honest and open discussions with users about the challenges that users have when applying their tools in different situations are being short-sighted. From my own experiences, I can say that talking about the 13 topics raised by James have been invaluable.

Why isn’t Software Testing Performed as Efficiently and Effecively as it could be?

Luis Fernández, an Associate professor at Universidad de Alcala is conducting a survey of software testers to gather data relating to, e.g., “Why isn’t software testing conducted as efficiently and effectively as it should be?” and “What factors lead to software testing being ‘under-appreciated’ as a potential career path?”

His survey (as of March, 2010) is listed here: http://www.cc.uah.es/encuestas/index.php?sid=28392&lang=en

Personally, I agree that the following two issues (identified in his survey) are significant causes of inefficiency in software testing:

1) “People tend to execute testing in an uncontrolled manner until the total expenditure of resources in the belief that if we test a lot, in the end, we will cover or control all the system.”

(Or, at least, given the relatively undisciplined test case selection methods prevalent in the industry, my experience in analyzing manually selected test scenarios is that testers generally believe (a) they are covering a higher proportion of an application’s possible combinations than they actually are and (b) they underestimate the amount of time that is spent during test execution unproductively repeating steps that they have previously tested)

2) “Many managers did not receive appropriate training on software testing so they do not appreciate its interest or potential for efficiency and quality.”

It is unfortunate, but true, that many testing managers do not have any background whatsoever in combinatorial testing methods that (a) dramatically reduce the amount of time it takes to select and document test cases, and (b) will simultaneously improve test execution efficiency when applied correctly. See, for example, https://www.hexawise.com/Combinatorial-Softwar-Testing-Case-Studies-IEEE-Computer-Kuhn-Kacker-Lei-Hunter.pdf

See also: http://www.slideshare.net/JustinHunter/efficient-and-effective-test-design

Please consider taking Fernández’s short survey. It takes only 5-10 minutes to complete.

Cem Kaner: Testing Checklists = Good / Testing Scripts = Bad?

I highly recommend this presentation by Cem Kaner (available here as a pdf download of slides). It is provocative, funny, and insightful. In it, Cem Kaner makes a strong case for using checklists (and mercilessly derides many aspects of using completely scripted tests). Cem Kaner, as I suspect most people reading this already know, is one of the leading lights of software testing education. He is a professor of computer sciences at Florida Institute of Technology and has contributed enormously to software testing education by writing Testing Computer Software “the best selling software testing book of all time,” founding the Center for Software Testing Education & Research, and making an excellent free course available online for Black Box Software Testing. <Trivia: Cem Kaner is one of two people I know about who work in the software testing today that have a law degree; the other person is me. After graduating from the University of Virginia Law School, I worked as a lawyer in London and Hong Kong for a large global firm before coming to my senses and realizing my interests, happiness and competence lay elsewhere>.

Here are a couple of my favorite slides from the presentation.

My own belief is that the presentation is very good and makes the points it wants to quite well. If I have a minor quibble with it, it is that in doing such a good job at laying out the case for checklists and against scripted testing, the presentation – almost by definition/design – does not go into as much detail as I would personally like to see about a topic that I think is extremely important and not written about enough; namely, how practitioners should use an approach that blends the advantages of scripted tests (that can generate some of the huge efficiency benefits of combinatorial testing methods for example) and checklist-based Exploratory Testing (which have the advantages pointed out so well in the presentation). A “both / and” option is not only possible; it is desirable.

– – –

Credit for bringing this presentation to my attention: Michael Bolton (the testing expert, of course, not the singer, [ {— “Office Space” video snippet] , posted a link to this presentation. Thanks again, Michael. Your enthusiastic recommendation to pick up boxed sets of the BBC show Connections was also excellent as well; the presenter of Connections is like a slightly tipsy genius with ADHD who possesses incredible grasp of history, an encyclopedic knowledge of quirky scientific developments and a gift for story-telling. I like how your mind works.

“I feel honored…”

There are some phrases in English that, as often as not, come off sounding obligatory and/or insincere. The phrase “I’m honored…” comes to mind (particularly if someone is accepting an award in front of a room full of people).

Be that as it may, I genuinely felt really honored last night and again today by a couple comments James Bach has said about me, including these:

Here’s the quick background: (1) James knows much more about software testing than I do and I respect his views a lot. (2) He has a reputation for not suffering fools gladly and pretty bluntly telling people he doesn’t respect them if he doesn’t respect the content of their views. (3) in addition to his extremely broad expertise on “testing in general” James, like Michael Bolton, knows a lot about pairwise and combinatorial testing methods and how to use them. (4) I firmly (and passionately) believe that pairwise and combinatorial testing methods are (a) dramatically under-appreciated, and (b) dramatically under-utilized. (5) James has published a very good and well-reasoned article about some of the limitations of pairwise testing methods that I wanted to talk to him about. (6) I co-wrote an article that IEEE Computer recently published about Combinatorial Testing that I wanted to discuss with him. (7) James and I have been at the STP Conference in Boston over the past few days. (8) I reached out to him and asked to meet at the conference to talk about pairwise and combinatorial testing methods and share with him my findings that – in the dozens of projects I’ve been involved with that have compared testers efficiency and effectiveness – I’ve routinely seen defects found per tester hour more than double. (9) I was interested in getting his insights into where are these methods most applicable? Least applicable? What have his experiences been in teaching combinatorial testing methods to students, etc.

In short, frankly, my goals in meeting with him were to: (a) meet someone new, interesting and knowledgeable and learn as much as could and try to understand from his experiences, his impressive critical thinking and his questioning nature, and (b) avoid tripping up with sloppy reasoning (when unapologetically expressing the reasons I feel combinatorial testing methods are dramatically under-appreciated by the software testing community) in front of someone who (i) can smell BS a mile away, and (ii) doesn’t suffer fools gladly.

I learned a lot, heard some fantastic war stories and heard his excellent counter-examples that disproved a couple of the generalizations I was making (but didn’t dampen my unshaken assertions that combinatorial testing methods are wildly under-utilized by the software testing community). I thoroughly enjoyed the experience. Moving forward, as a result of our meeting, I will go through an exercise which will make me more effective (namely carefully thinking through and enumerating all of the assumptions behind my statements like: “I’ve measured the effectiveness of testers dozens of times – trying to control external variables as much as reasonably possible – and I’m consistently seeing more than twice as many defects per tester hour when testers adopt pairwise/combinatorial testing methods.”

His complement last night was private so I won’t share it but it ranks up there in my all time favorite complements I’ve ever received. I’m honored. Thanks James.

Matthew Heusser’s Funny and Insightful “Fishing Maturity Model” Analogy

Matt Heusser

Matthew Heusser, an accomplished tester, frequent blogger, and insightful contributor in the Context Driven Testing mailing list, and a testing expert whose opinion I respect a lot, has just published a very thought-provoking blog post that highlights an important issue surrounding “PowerPointy” consultants in the testing industry who have relatively weak real world testing chops.  It’s called “The Fishing Maturity Model.”

Matthew argues that testers are well-advised to be skeptical of self-described testing experts who claim to “have the answer”  – particularly when such “experts” haven’t actually rolled their sleeves up and done software testing themselves.  In reading his article, I found it quite thought-provoking, particularly because it hit close to home: while I’m by no means a testing expert in the broader sense of the term, I do consider myself to know enough about combinatorial test design strategies applicable to software testing to be able to help most testing teams become demonstrably more efficient and effective… and yet, my actual hands-on testing experiences are admittedly quite limited.  If I’m not one of the guys he’s (justifiably) skewering with his funny and well-reasoned post (and he assures me I’m not; see below), a tester could certainly be forgiven for mistaking me for one based on my past experiences.

Matthew’s Five Levels of the Fishing Maturity Model (based, not so loosely, of course on the Testing Maturity Model, not to mention CMM, and CMMi)…

The five levels of the fishing maturity model:
1 – Ad-hoc. Fishing is an improvised process.
2 – Planned. The location and timing of our ships is planned. With a knowledge of how we did for the past two weeks, knowing we will go to the same places, we can predict our shrimp intake.
3 – Managed. If we can take the shrimp fishing process and create standard processes – how fast to drive the boat, and how deep to let out the nets, how quickly, etc, we can improve our estimates over time, more importantly.
4 – Measured. We track our results over time – to know exactly how many pounds of shrimp are delivered at what time with what processes.
5 – Optimizing. At level 5, we experiment with different techniques; to see what gathers more shrimp and what does not. This leads us to continual improvement.

Sounds good, right? Why, with a little work, this would make a decent 1-hour conference presentation. We could write a little book, create a certification, start running conferences …

And the rub…

The problem:
I’ve never fished with nets in my entire life. In fact, the last time I fished with a pole, I was ten years old at Webelo’s camp.

I posted the following response, based on my personal experiences:  Words in [brackets] are  Matthew’s response to me.

Matthew,

Excellent post, as usual.  [I’m glad you like it. Thank you.]

You raise very good points. Testers (and other IT executives) should be leery of snake oil salesmen and use their judgment about “experts” who lack practical hands-on experience. While I completely agree with this point, I offer up my own experiences as a “counter-example” to the problem you pointed out here.

3-4 years ago, while I was working at a management consulting and IT company, (with a personal background as an entrepreneur, lawyer, and management consultant – and not in software testing), I began to recommend to any software testers who would listen, that they start using a different approach to how they designed their test cases. Specifically, I was recommending that testers should begin using applied statisitics-based methods* designed to maximize coverage in a small number of tests rather than continuing to manually select test cases and rely on SME’s to identify combinations of values that should be tested together. You could say, I was recommending that they adopt what I consider to be (in many contexts) a “more mature” test design process.

The reaction I got from many teams was, as you say “this whole thing smells fishy to me” (or some more polite version of the rebuttal “Why in the world should I, with my years of experience in software testing, listen to you – a non-software tester?”) Here’s the thing: when teams did use the applied statistics-based testing methods I recommended, they consistently saw large time reductions in how long it took them to identify and document tests (often 30-40%) and they often saw huge leaps in productivity (e.g., often finding more than twice as many defects per tester hour). In each proof of concept pilot, we measured these carefully by having two separate teams – one using “business as usual” methods, the other using pairwise or orthogonal array-based test design strategies – test the same application. Those dramatic results led to my decision to create Hexawise, a software test design tool.  [Point Taken …]

My closing thoughts related to your post boils down to:

1) I agree with your comment – “There are a lot of bogus ideas in software development.”

2) I agree that testers shouldn’t accept fancy PowerPointed ideas like “this new, improved method/model/tool will solve all your problems.”

3) I agree that testers should be especially skeptical when the person presenting those PowerPointed slides hasn’t rolled up their sleeves for years as a software testing practitioner.

Even so…

4) Some consultants who lack software testing experience actually are capable of making valuable recommendations to software testers about how they can improve their efficiency and effectiveness. It would be a mistake to write them off as charlatans because of their lack of software testing experience.  [I agree with the sentiment that sometimes, people out of the field can provide insight. I even hinted at that with the comment that at least, Forrest should listen, then use his discernment on what to use. I’m not entirely ready to, as the expression goes, throw the baby out with the bathwater.]

5) Following the “bogus ideas” link above takes readers to your quote that: “When someone tells you that your organization has to do something ‘to stay competitive,’ but he or she can’t provide any direct link to sales, revenue, reduced expenses, or some other kind of money, be leery.” I enthusiastically agree. In the software testing community, in my view, we do not focus enough on gathering real data** about which approaches work (or -ideally- in what contexts they work). A more data-driven management approach would help everyone understand what methods and approaches deliver real, tangible benefits in a wide variety of contexts vs. those methods and approaches that look good on paper but fall short in real-world implementations.  [Hey man, you can back up your statements with evidence, and you’re not afraid to roll up your sleeves and enter an argument. I may not always agree with you, but you’re exactly the kind of person I want to surround myself with, to keep each other sharp. Thank you for the thoughtful and well reasoned comment.]

– Justin

Company – http://www.hexawise.com
Blog – https://hexawise.wordpress.com
Forum – http://testing.stackexchange.com

*I use the term “applied statistics-based testing” to incorporate pairwise, orthogonal array-based, and more comprehensive combinatorial test design methods such as n-wise testing (that can capture, for example, all possible valid combinations involving 6-values).

**Here is an article I co-wrote which provides some solid data that applied statitics-based testing methods can more than double the number of defects found per tester hour (and simultaneously result in finding more defects) as compared to testing that relies on “business as usual” methods during the test case identification phase.

Efficient and Effective Test Design

I enjoyed talking about efficient and effective combination testing strategies (and highlights of a recent empirical study) at yesterday’s TISQA meeting together with Lester Bostic of Blue Cross Blue Shield North Carolina, who shared his team’s experiences of adopting a combinatorial testing approach.  It addresses how tools like Hexawise can help software testers quickly identify the test cases they should execute to find as many defects as possible with as few tests as possible.  I wanted to share it now; once I have more time, I will comment on it and highlight some of the good questions, comments, discussion points, and tester experiences that were raised by the attendees.

The presentation focused on combinatorial testing techniques, such as pairwise testing, orthogonal array-based testing methods, and more thorough combination testing strategies (capable of identifying all defects that could be captured by, say, any possible combination of three or four “things” that you’ve decided to test for (regardless of whether those “things” include features configurations or equivalence class of data or type of user a mix of each).

The middle of the presentation also highlights empirical evidence that shows this method of identifying test cases often has an enormous impact on how quickly software testers are able to identify defects; citing the IEEE Computer article I co-wrote last month on Combinatorial Testing, this approach – on average – led to more than twice as many defects found per tester hour.

The final section of the presentation was delivered by Lester Bostic of Blue Cross Blue Shield and addresses his lessons learned.  Lester used Hexawise to reduce 1,356,136,048,589,996,428,428,909,447,344,392,489,476,985,674,792,960 possible tests (that would have been necessary to achieve comprehensive testing of the application he was testing) to only 220 tests that proved to be extremely effective at identifying defects.  <Side note: No, that absurdly large number with 51 digits after the “1” is not a typo;  it makes me smile every time I see it>.

Comments and questions are welcome.

What Else Can Software Development and Testing Learn from Manufacturing? Don’t Forget Design of Experiments (DoE).

Lessons_from_Car_Manufacturing-20090826-171852

Tony Baer from Ovum recently wrote a blog post titled: Software Development is like Manufacturing which included the following quotes:

“More recently, debate has emerged over yet another refinement of agile – Lean Development, which borrows many of the total quality improvement and continuous waste reduction principles of lean manufacturing. Lean is dedicated to elimination of waste, but not at all costs (like Six Sigma). Instead, it is about continuous improvement in quality, which will lead to waste reduction….

In essence, developing software is like making a durable good like a car, appliance, military transport, machine tool, or consumer electronics product…. you are building complex products that are expected to have a long service life, and which may require updates or repairs.”

Here are my views: I see valid points on both sides of the debate.  Rather than weigh general high-level pro’s and cons, though, I would like to zero in on what I see as an important topic that is all-too-often missing from the debate.  Specifically,  Design of Experiments has been central to Six Sigma, Lean Manufacturing, the Toyota Production System, and Deming’s quality improvement approaches, and is equally applicable to software development and testing, yet adoption of Design of Experiments methods in software design and testing remains low.  This is unfortunate because significant benefits consistently result in both software development and software testing when Design of Experiments methods are properly implemented.

What are Design of Experiments Methods and Why are they Relevant?

In short, Design of Experiments methods are a proven approach to creating and managing experiments that alter variables intelligently between each test run in a structured way that allows the experimenter to learn as much as possible in as few experiments as possible.  From wikipedia: “Design of experiments, or experimental design, (DoE) is the design of all information-gathering exercises where variation is present, whether under the full control of the experimenter or not. Often the experimenter is interested in the effect of some process or intervention (the “treatment”) on some objects (the “experimental units”).”

Design of Experiments methods are an important aspect of Lean Manufacturing, Six Sigma, the Toyota Production System, and other manufacturing-related quality improvement approaches/philosophies.  Not only have Design of Experiments methods been very important to all of the above in manufacturing settings, they are also directly relevant to software development. By way of example, W. Edwards Deming, who was extremely influential in quality initiatives in manufacturing in Japan and the U.S. was an applied statistician. He and thousands of other highly respected quality executives in manufacturing, including Box, Juran and Taguchi (and even my dad), have regularly used Design of Experiments methods as a fundamental anchor of quality improvement and QA initiatives and yet relatively few people who write about software development seem to be aware of the existence of Design of Experiments methods.

What Benefits are Delivered in Software Development by Design of Experiments-based Tools?

Application Optimization applications, like Google’s Website Optimizer are a good example of Design of Experiments methods can deliver powerful benefits in the software development process.  It allows users to easily vary multiple aspects of web pages (images, descriptions, colors, fonts, colors, logos, etc.) and capture the results of user actions to identify which combinations work the best. A recent YouTube multi-variate experiment (e.g., and experiment created using Design of Experiment methods) shows how they used the simple tool and increased sign-up rates by 15.7%.  The experiment involved 1,024 variations.

What Benefits are Delivered in Software Testing by Design of Experiments-based Tools?

In addition, software test design tools, like the Hexawise test design tool my company created, enable dramatically more efficient software testing by automatically varying different elements of use cases that are tested in order to achieve an optimal coverage. Users input the things in the application they want to test, push a button and, as in the Google Web Optimizer example, the tool uses DoE algorithms to identify how the tests should be run to maximize efficiency and thoroughness.  A recent IEEE Computer article I contributed to, titled “Combinatorial Testing” shows, on average, over the course of 10 separate real-world projects, tester productivity (measured in defects found per tester hour) more than doubled, as compared to the control groups which continued to use their standard manual methods of test case selection: http://tinyurl.com/nhzgaf

Unfortunately, Design of Experiments methods – one of the most powerful methods in Lean Manufacturing, Six Sigma, and the Toyota Production System – are not yet widely adopted in the software development industry. This is unfortunate for two reasons, namely:

  1. Design of Experiments methods will consistently deliver measurable benefits when implemented correctly, and
  2. Sophisticated new tools designed with very straightforward user interfaces make it easier than ever for software developers and testers to begin using these helpful methods.

– Justin