Why isn’t Software Testing Performed as Efficiently and Effecively as it could be?

Luis Fernández, an Associate professor at Universidad de Alcala is conducting a survey of software testers to gather data relating to, e.g., “Why isn’t software testing conducted as efficiently and effectively as it should be?” and “What factors lead to software testing being ‘under-appreciated’ as a potential career path?”

His survey (as of March, 2010) is listed here: http://www.cc.uah.es/encuestas/index.php?sid=28392&lang=en

Personally, I agree that the following two issues (identified in his survey) are significant causes of inefficiency in software testing:

1) “People tend to execute testing in an uncontrolled manner until the total expenditure of resources in the belief that if we test a lot, in the end, we will cover or control all the system.”

(Or, at least, given the relatively undisciplined test case selection methods prevalent in the industry, my experience in analyzing manually selected test scenarios is that testers generally believe (a) they are covering a higher proportion of an application’s possible combinations than they actually are and (b) they underestimate the amount of time that is spent during test execution unproductively repeating steps that they have previously tested)

2) “Many managers did not receive appropriate training on software testing so they do not appreciate its interest or potential for efficiency and quality.”

It is unfortunate, but true, that many testing managers do not have any background whatsoever in combinatorial testing methods that (a) dramatically reduce the amount of time it takes to select and document test cases, and (b) will simultaneously improve test execution efficiency when applied correctly. See, for example, https://www.hexawise.com/Combinatorial-Softwar-Testing-Case-Studies-IEEE-Computer-Kuhn-Kacker-Lei-Hunter.pdf

See also: http://www.slideshare.net/JustinHunter/efficient-and-effective-test-design

Please consider taking Fernández’s short survey. It takes only 5-10 minutes to complete.

What is Agile? What is not Agile?

An unusually hectic work-schedule has been keeping me hopping lately.  I returned this weekend from a great two-week trip to the UK in which I visited with 5 testing teams using our Hexawise tool to design test cases for applications being used in two banks, a consulting and systems integration firm, a grocery store chain, and a telecoms company.

Every product manager worth his or her salt will tell you it is a good idea to go meet with customers, listen to them, and watch them as they use your application.  Even though everyone I know agrees with this, I find it difficult to make happen as regularly as I would like to.  This trip provided me with a reminder of how valuable in-depth customer interactions can be.  The two weeks of on-site visits with testing teams proved to be  great way to: (a) reconnect with customers, (b) get actionable input about what users  like / don’t like about our tool, (c) identify new ways we can continue to refine our tool, and even (d) understand a couple unexpected ways teams are using it.

Bret Petticord’s tweets on “What is Agile?” / “not Agile?” prompted me to write this quick post.  I like them a lot.

When we first created our Hexawise tool, we followed the 4 steps Bret lays out in his description of “What is Agile?”  My experience in the UK over the last two weeks was the start of one of many “Repeat” cycles.

I admire people who can succinctly summarize wisdom into bite-sized quips like Bret achieved with his two tweets.  Another guy who excels at creating sound-bites is James Carville.  Love him or hate him, he has that skill in spades.  When I watched the movie “War Room,” I felt like I was watching the “master of the sound-bite” in his element.  Me?  I’m more of a rambling, meandering, verbose communicator.  I’ve just taken 332 words and a screen shot with Bret’s tweets when all I set out to do in starting to write this post was to share Bret’s 32 words with you.

25 Great Quotes for Software Testers

All the quotes below are from the inside cover of Statistics for Experimenters written by George Box, Stuart Hunter, and William G. Hunter (my late father).  The Design of Experiments methods expressed in the book (namely, the science of finding out as much information as possible in as few experiments as possible), were the inspiration behind our software test case generating tool.  In paging through the book again today, I found it striking (but not surprising) how many of these quotes are directly relevant to efficient and effective software testing (and efficient and effective test case design strategies in particular):

  • “Discovering the unexpected is more important than confirming the known.”
  • “All models are wrong; some models are useful.”
  • “Don’t fall in love with a model.”
  • How, with a minimum of effort, can you discover what does what to what?  Which factors do what to which responses?
  • “Anyone who has never made a mistake has never tried anything new.” – Albert Einstein
  • “Seek computer programs that allow you to do the thinking.”
  • “A computer should make both calculations and graphs.  Both sorts of output should be studied; each will contribute to understanding.”  – F. J. Anscombe
  • “The best time to plan an experiment is after you’ve done it.” – R. A. Fisher
  • “Sometimes the only thing you can do with a poorly designed experiment is to try to find out what it died of.”  – R. A. Fisher
  • The experimenter who believes that only one factor at a time should be varied, is amply provided for by using a factorial experiment.
  • Only in exceptional circumstances do you need or should you attempt to answer all the questions with one experiment.
  • “The business of life is to endeavor to find out what you don’t know from what you do; that’s what I called ‘guessing what was on the other side of the hill.'”  – Duke of Wellington
  • “To find out what happens when you change something, it is necessary to change it.”
  • “An engineer who does not know experimental design is not an engineer.”  – Comment made by to one of the authors by an executive of the Toyota Motor Company
  • “Among those factors to be considered there will usually be the vital few and the trivial many.”  – J. M. Juran
  • “The most exciting phrase to hear in science, the one that heralds discoveries, is not ‘Eureka!’ but ‘Now that’s funny…'” – Isaac Asimov
  • “Not everything that can be counted counts and not everything that counts can be counted.” – Albert Einstein
  • “You can see a lot by just looking.”  – Yogi Berra
  • “Few things are less common than common sense.”
  • “Criteria must be reconsidered at every stage of an investigation.”
  • “With sequential assembly, designs can be built up so that the complexity of the design matches that of the problem.”
  • “A factorial design makes every observation do double (multiple) duty.”  –  Jack Couden

Where the quotes are not attributed, I’m assuming the quote is from one of the authors.  The most well known of the quotes not attributed, above, “All models are wrong; some models are useful.” is widely attributed to George Box in particular, which is accurate.  Although I forgot to confirm that suspicion with him when I saw him over Christmas break, I suspect most of them are from George (as opposed to from Stu or my dad); George is 90 now and still off-the-charts smart, funny, and is probably the best story teller I’ve met in my life.  If he were younger and on Twitter, he’d be one of those guys who churned out highly retweetable chestnuts again and again.

Related thoughts

As you know if you’ve read my blog before, I am a strong proponent of using the Design of Experiments principles laid out in this book and applying them in field of software testing to improve the efficiency and effectiveness of software test case design (e.g., by using pairwise software testing, orthogonal array software testing, and/or combinatorial software testing techniques).  In fact, I decided to create my company’s test case generating tool, called Hexawise, after using Design of Experiments-based test design methods during my time at Accenture in a couple dozen projects and measuring dramatic improvements in tester productivity (as well as dramatic reductions in the amount of time it took to identify and document test cases).  We saw these improvements in every single pilot project when we  used these methods to identify tests.

My goal, in continuing to improve our Hexawise test case generating tool, is to help make the efficiency-enhancing Design of Experiments methods embodied in the book, accessible to “regular” software testers, and more more broadly adopted throughout the software testing field.  Some days, it feels like a shame that the approaches from the Design of Experiments field (extremely well-known and broadly used in manufacturing industries across the globe, in research and development labs of all kinds, in product development projects in chemicals, pharmaceuticals, and a wide variety of other fields), have not made much of an inroad into software testing.  The irony is, it is hard to think of a field in which it is easier, quicker, or immediately obvious to prove that dramatic benefits result from adopting Design of Experiments methods than software testing.  All it takes is for a testing team to decide to do a simple proof of concept pilot.  It could be for as little as a half-day’s testing activity for one tester.  Create a set of pairwise tests with Hexawise or another t00l like James Bach’s AllPairs tool.  Have one tester execute the tests suggested by the test case generating tool. Have the other tester(s) test the same application in parallel.  Measure four things:

  1. How long did it take to create the pairwise / DoE-based test cases?
  2. How many defects were found per hour by the tester(s) who executed the “business as usual” test cases?
  3. How many defects were found per hour by the tester who executed the pairwise / DoE-based tests?
  4. How many defects were identified overall by each plan’s tests?

These four simple measurements will typically demonstrate dramatic improvements in:

  • Speed of test case identification and documentation
  • Efficiency in defects found per hour

As well as consistent improvements to:

  • Overall thoroughness of testing.

A Suggestion: Experiment / Learn / Get the Data / Let the Efficiency and Effectiveness Findings Guide You

I would be thrilled if this blog post gave you the motivation to explore this testing approach and measure the results.  Whether you’ve used similar-sounding techniques before or never heard of DoE-based software testing methods before,  whether you’re a software testing newbie or a grizzled veteran, I suspect the experience of running a structured proof of concept pilot (and seeing the dramatic benefits I’m confident you’ll see) could be a watershed moment in your testing career.  Try it!  If you’re interested in conducting a pilot, I’d be happy to help get you started and if you’d be willing to share the results of your pilot publicly, I’d be able to provide ongoing advice and test plan review.  Send me an email or leave a comment.

To the grizzled and skeptical veterans, (and yes, Mr, Shrini Kulkarni / @shrinik who tweeted “@Hexawise With all due respect. I can’t credit any technique the superpower of 2X defect finding capability. sumthng else must be goingon” before you actually conducted a proof of concept using Design of Experiments-based testing methods and analyzed your findings, I’m lookin’ at you),  I would (re)quote Sophocles: “One must try by doing the thing; for though you think you know it, you have no certainty until you try.” For newer testers, eager to expand your testing knowledge (and perhaps gain an enormous amount of credibility by taking the initiative, while you’re at it), I’d (re)quote Cole Porter: “Experiment and you’ll see!

I’d welcome your comments and questions.  If you’re feeling, “Sounds too good to be true, but heck, I can secure a tester for half a day to run some of these DoE-based / pairwise tests and gather some data to see whether or not it leads to a step-change improvement in efficiency and effectiveness of our testing” and you’re wondering how you’d get started, I’d be happy to help you out and do so at no cost to you.  All I’d ask is that you share your findings with the world (e.g., in your blog or let me use your data as the firms did with their findings in the “Combinatorial Software Testing” article below).

– Justin

Related: (Introductory Hexawise video overview showing 6.5 trillion possible tests reduced, using Design of Experiments techniques to the 37 tests most likely to find defects)

Related: (Article explaining behind Design of Experiments-based software testing techniques such as pairwise, OA, and n-wise testing: Combinatorial Software Testing by Kuhn, Kacker, Lei, and Hunter (pdf download)

Related: (Prior blog post) “In Praise of Data-Driven Management (AKA “Why You Should be Skeptical of HiPPO’s”)”

Related: (My brother’s blog: he’s in IT too and is also a strong proponent of using Design of Experiments-based software test design methods to improve software testing efficiency and effectiveness).

Defect Seen >10 Million Times and Still not Corrected…

I responded to a recent blog post written by Gareth Bowles today and was struck – again – that a defect that must have been seen >10 million times by now has still not been corrected. When anyone responds to a blog post on Blogger.com, the stat counter says “1 comments” instead of correctly stating “1 comment.” What’s up with that?

The clothing company Lands’ End (with the apostrophe erroneously after the s instead of before it) has a bizarre but somewhat logical explanation for why they have printed their grammatical-mistake-laden brand name on millions of pieces of clothing. According to one version of the story I have heard, they printed their first brochures with the typo and couldn’t afford to get it changed. I also remember reading a more detailed explanation in a catalog in the late 80’s to the effect that by the time the company management realized their mistake and tried to get trademark protection on “Land’s End” they discovered that another firm already had trademarked rights to that name. Quick internet searches can’t verify that so perhaps my memory is just playing tricks on me. But I digress. Here’s the defect I wanted to highlight with this post:

For Blogger.com to leave the extra “s” in has me stumped for several reasons. First, this defect has been seen by a ton of people as Blogger.com is, according to Alexa’s site tracking, the world’s 7th most popular site. Second, Blogger.com is owned by Google (among the most competent, quality-oriented IT wizards on the planet) and no trademark protection is preventing the correction. Third, it would seem to be such an easy thing to fix. Fourth, other sites (like wordpress) don’t make the same mistake. Fifth, it doesn’t seem like a “style preference” issue (like spelling traveled with one “L” or two); it seems to me like a pretty clear case of a mistake. It would be a mistake to say “one cars,” “one computers,” or “one pedantic grammarians”; similarly, it is a mistake to say “one comments”. What gives? Anyone have any ideas?

For anyone wondering where the “>10 million times” figure came from, it is pure conjecture on my part. If anyone has a reasoned way to refute or confirm it or propose a better estimate, I’m all ears.

Great Bug Tracking Tool – Tails

I’ve recently tried out Tails as a bug tracking tool. I like it and I’d recommend you check it out if you’re looking for a straightforward bug-tracking tool without a lot of extra bells and whistles. This is a quick review of what I have found to be the best defect tracking tool for my purposes.

When someone recommends something to you (whether a movie, a car, or a software application), it is useful to have an understanding of where they’re coming from; when ordering from Netflix will they be drawn to the gritty genius of “the Usual Suspects” or an animated Disney classic like Fantasia? Is their idea of the perfect car a 36 HP 1959 Karmann Ghia convertible or a 2009 Humvee?

With that said, here’s where I’m coming from with respect to software applications. I’ve always appreciated nice, simple, cleanly-designed software applications that work as you’d like them to without requiring you to invest time searching help files or in training. My appreciation for clean, straightforward applications has increased in the last year as I’ve had more hands-on Product Management responsibilities at Hexawise and I’ve seen first hand how hard it can sometimes be to strike the right balance between (1) the goals of elegance and simplicity on the one hand and (2) a Product Manager’s natural desire to equip the application with additional features and functionality on the other hand.

The screen capture tool Skitch has done a superb job of achieving this balancing act, as described well in Sean Johnson’s article, in which he writes: “These days it takes more than being an adequate solution to a real problem that people have and are willing to pay to solve. That’s certainly required, but it’s just not enough. You have to create happiness and joy in your users and they must love your product.” Unfortunately, Skitch is only available to Mac users for now. Similarly, Seth Godin and the gang at 37 Signals have done an excellent job at putting together simple, clean, powerful applications like Basecamp and Highrise. I strongly recommend their blog, Signal vs. Noise, about “design, business, experience, simplicity, culture and more” and their book “Getting Real“. I’ve been heavily influenced by the designers of those tools when making Product Management and Design decisions about our test design tool Hexawise.

Enough preamble. My point is, if you appreciate the similar design philosophies behind Skitch, Highrise, Basecamp, and Hexawise, which place a premium on nice, clean, intuitive design (and explicitly try to avoid “feature bloat”), I suspect you’ll like Tails as a bug-tracking tool and enjoy using it. By design, Tails doesn’t have a lot of bells and whistles. It does such a good job at the features the vast majority majority of users need, that it is a joy to use. I’ve attached a couple screen shots below (with a few redactions to protect client confidentiality, etc.).

– Justin

What Software Testers Can Learn from the Game of 20 Questions

Dave Whalen posted a good piece here asserting that software testing, done well, requires a blend of “Science” and “Art”. I recommend it. (He also has a good post about testing databases here).

He includes the statement below which I agree with. If you are a software tester and any doubts about whether all of these methods work (pairwise software testing in particular), I would encourage you to conduct a pilot project on your own and measure the results achieved with and without the technique applied.

From the scientific side, testing can include a number of proven techniques such as equivalency class testing, boundary value analysis, pair-wise testing, etc. These techniques, if used properly, can reduce test times and focus on finding the bugs where they tend to hang out – much like a porch light on a summer night.

My response to Dave’s post, included below, is not especially profound or even well-written, but, hey, I’m in a hurry in the pre-Thanksgiving rush and the topic hit close to home so I couldn’t resist jotting a little something. Enjoy. Please let me know your thoughts / reactions if you have any.

Dave,

Very well said!

I wholeheartedly, enthusiastically agree with your premise. I also wish that more people saw things the same way.

My father co-wrote Statistics for Experimenters which describes the “art and science” within the Design of Experiments (“DoE”) field of applied statistics. Well-run manufacturing companies use DoE techniques in their manufacturing processes. Many companies, such as Toyota see them as an absolutely fundamental part of their processes (yet unfortunately, software testers, who could use DoE techniques such as pairwise and other forms of combinatorial testing, are often ignorant about how to use them properly and the software testing industry as a whole dramatically under-utilizes such techniques…. but I digress).

I brought the book up because it opens up with a good example relevant to the points you made. To win at the game of 20 questions, it is useful to know “the science” of game theory and DoE; choose questions so that there is a 50/50 chance that the answer will be Yes. Someone who knows this technique, all else being equal, will be win more because of their “scientific” approach than someone who doesn’t know the technique. And yet… other stuff (whether the subject matter expertise in this example, or subject matter expertise and “artistic” Exploratory Testing in your example) is indispensable as well.

You can’t truly excel at either 20 Questions or software testing unless you have a good mix of “science” (governed by mathematical principles, proven methods of DoE, etc.) and and “art” (governed by experience, instincts, and subject matter expertise).

– Justin

Cem Kaner: Testing Checklists = Good / Testing Scripts = Bad?

I highly recommend this presentation by Cem Kaner (available here as a pdf download of slides). It is provocative, funny, and insightful. In it, Cem Kaner makes a strong case for using checklists (and mercilessly derides many aspects of using completely scripted tests). Cem Kaner, as I suspect most people reading this already know, is one of the leading lights of software testing education. He is a professor of computer sciences at Florida Institute of Technology and has contributed enormously to software testing education by writing Testing Computer Software “the best selling software testing book of all time,” founding the Center for Software Testing Education & Research, and making an excellent free course available online for Black Box Software Testing. <Trivia: Cem Kaner is one of two people I know about who work in the software testing today that have a law degree; the other person is me. After graduating from the University of Virginia Law School, I worked as a lawyer in London and Hong Kong for a large global firm before coming to my senses and realizing my interests, happiness and competence lay elsewhere>.

Here are a couple of my favorite slides from the presentation.

My own belief is that the presentation is very good and makes the points it wants to quite well. If I have a minor quibble with it, it is that in doing such a good job at laying out the case for checklists and against scripted testing, the presentation – almost by definition/design – does not go into as much detail as I would personally like to see about a topic that I think is extremely important and not written about enough; namely, how practitioners should use an approach that blends the advantages of scripted tests (that can generate some of the huge efficiency benefits of combinatorial testing methods for example) and checklist-based Exploratory Testing (which have the advantages pointed out so well in the presentation). A “both / and” option is not only possible; it is desirable.

– – –

Credit for bringing this presentation to my attention: Michael Bolton (the testing expert, of course, not the singer, [ {— “Office Space” video snippet] , posted a link to this presentation. Thanks again, Michael. Your enthusiastic recommendation to pick up boxed sets of the BBC show Connections was also excellent as well; the presenter of Connections is like a slightly tipsy genius with ADHD who possesses incredible grasp of history, an encyclopedic knowledge of quirky scientific developments and a gift for story-telling. I like how your mind works.