In Praise of Data-Driven Management (AKA “Why You Should be Skeptical of HiPPO’s”)

Hipp

Jeff Fry recently linked to a fantastic webcast in Controlled Experiments To Test For Bugs In Our Mental Models. I would highly recommend it to anyone without any reservations. Ron Kohavi, of Microsoft Research does a superb job of using interesting real-world examples to explain the benefits of conducting small experiments with web site content and the advantages of making data-driven decisions. The link to the 22-minute video is here.

I firmly believe that the power of applied statistics-based experiments to improve products is dramatically under-appreciated by businesses (and, for that matter, business schools), as well as the software development and software testing communities. Google, Toyota, and Amazon.com come to mind as notable exceptions to this generalization; they “get it”. Most firms though still operate, to their detriment, with their heads in the sand and place too much reliance on untested guesswork, even for fundamentally important decisions that would be relatively easy to double-check, refine, and optimize through small applied statistics-based experiments that Kohavi advocates. Few people who understand how to properly conduct such experiments are as articulate and concise as Kohavi. Admittedly, I could be accused of being biased as: (a) I am the son of a prominent applied statistician who passionately promoted broader adoption of such methods by industry and (b) I am the founder of a software testing tools company that uses applied statistics-based methods and algorithms to make our tool work.

Here is a short summary of Kohavi’s presentation: Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO

1:00 Amazon: in 2000, Greg Linden wanted to add recommendations in shopping carts during the check out process. The “HiPPO” (meaning the Highest Paid Person’s Opinion) was against it thinking that such recommendations would confuse and/or distract people. Amazon, a company with a good culture of experimentation, decided to run a small experiment anyway, “just to get the data” – It was wildly successful and is in widespread use today at Amazon and other firms.

3:00 Dr. Footcare example: Including a coupon code above the total price to be paid had a dramatic impact on abandonment rates.

4:00 “Was this answer useful?” Dramatic differences in user response rates occur when Y/N is replaced with 5 Stars and whether an empty text box is initially shown with either (or whether it is triggered only after a user clicks to give their initial response)

6:00 Sewing machines: experimenting with a sales promotion strategy led to extremely counter-intuitive pricing choice

7:00 “We are really, really bad at understanding what is going to work with customers…”

7:30 “DATA TRUMPS INTUITION” {especially on novel ideas}. Get valuable data through quick, cheap experimentation. “The less the data, the stronger the opinions.”

8:00 Overall Evaluation Criteria: “OEC” What will you measure? What are you trying to optimize? (Optimizing for the “customer lifetime value”)

9:00 Analyzing data / looking under the hood is often useful to get meaningful answers as to what really happened and why

10:30 A/B tests are good; more sophisticated multi-variate testing methods are often better

12:00 Some problems: Agreeing upon Overall Evaluation Criteria is hard culturally. People will rarely agree. If there are 10 changes per page, you will need to break things down into smaller experiments.

14:00 Many people are afraid of multiple experiments [e.g., multi-variate experiments or MVE] much more than they should be.

(A/B testing can be as simple as changing a single variable and comparing what happens when it is changed, e.g., A = “web page background = Blue” / B = “web page background = Orange.” Multi-variate experiments involve changing multiple variables in each test run which means that people running the tests should be able to efficiently and effectively change the variables in order to ensure not only that each of the variables is tested but also that the each of the variables is tested in conjunction with each of the others because they might interact with one another). <My views on this: before software tools made conducting multi-variate experiments (and understanding the results of the experiments) a piece of cake, this fear had some merit; you would need to be able to understand books like this to be able to competently run and analyze such experiments. Today however, many tools, such as Google’s Website Optimizer (used for making web sites better at achieving their click through goals, etc.) and Hexawise (used to find defects with fewer test cases) build the complex Design of Experiments-based optimization algorithms into the tool’s computation engine and provide the user of the tool with a simple user interface and user experience. In short, in 2009, you don’t need a PhD in applied statistics to conduct powerful multi-variate experiments. Everyone can quickly learn how to, and almost all companies should, use these methods to improve the effectiveness of applications, products and/or production methods. Similarly, everyone can quickly learn how to, and almost all companies should, use these methods to dramatically improve the effectiveness of their software testing processes.>

16:00 People do a very bad job at understanding natural variation and are often too quick to jump to conclusions.

17:00 eBay does A/B testing and makes the control group ~1%. Ron Kohavi, the presenter, suggests starting small then quickly ramping up to 50/50 (e.g., 50% of viewers will see version A, 50% will see version B).

19:00 Beware of launching experiments than “do not hurt,” there are feature maintenance costs.

20: 00 Drive to a data-driven culture. “It makes a huge difference. People who have worked in a data-driven culture really, really love it… At Amazon… we built an optimization system that replaced all the debates that used to happen on Fridays about what gets on the home page with something that is automated.”

21:00 Microsoft will be releasing its controlled experiments on the web platform at some point in the future, but probably not in the next year.

21:00 Summary

Listen to your customers because our intuition at assessing new ideas is poor.
Don’t let the HiPPO drive decisions; they are likely to be wrong. Instead, let the customer data drive decisions.
Experiment often create a trustworthy system to accelerate innovation.

Justin Hunter
Founder and CEO
Hexawise
“More coverage. Fewer tests.”

10 thoughts on “In Praise of Data-Driven Management (AKA “Why You Should be Skeptical of HiPPO’s”)”

Pingback: Matthew Heusser’s Funny and Insightful “Fishing Maturity Model” Analogy « Hexawise Blog
Seth Eliot | January 11, 2010 at 7:18 pm

Hi Justin,

You say:
17:00 eBay does A/B testing and makes the control group ~1%. Ron Kohavi, the presenter, suggests starting small then quickly ramping up to 50/50 (e.g., 50% of viewers will see version A, 50% will see version B).

Did you mean that they make the *treatment* group 1%? That is to say most people see the old code, and the new code is limited to a smaller audience to lower risk.

Feel free to delete this comment after you read it…I just didn’t have your e-mail.

- Justin Hunter | January 19, 2010 at 11:10 am
  
  Experts will disagree. For me, it has everything to do with how large the site is (in traffic volume numbers) and how “risky” the change is. If you wanted to test a site with fewer than 1,000 visitors a month, there is a lot to be said for going for 50/50 testing right away (e.g., where 50% of the visitors see “A” and 50% of the visitors see “B.” If you have, e.g., more than 1 million visits a month, starting with a small subset of visitors to conduct A/B testing on may well make more sense (particularly if the change represents a noticeable departure from the “standard” site. In a situation where you’re testing a large site, and you’ve gathered information on 1% of the visits that strongly suggests your modified layout is more effective, increasing the number of users who see the modified page could well make sense. Consistent findings based on larger numbers of visitors, for example, would give you increased confidence that your data-based conclusions are statistically valid.
  
  - Seth Eliot | January 19, 2010 at 12:49 pm
    
    Hi Justin. I think you may have misunderstood my comment. I simply wish to point out you may have a typo.
    
    “17:00 eBay does A/B testing and makes the control group ~1%”
    
    should probably read
    
    “17:00 eBay does A/B testing and makes the treatment group ~1%”
    or
    “17:00 eBay does A/B testing and makes the control group ~99%”
    
    As before, please feel free to NOT publish this…
Pingback: 25 Great Quotes for Software Testers « Hexawise Blog
Pingback: Curious Cat Management Improvement Blog » Circle of Influence
Pingback: Curious Cat Management Improvement Blog » Posts on Managing People from Around the Web
Pingback: Curious Cat Management Improvement Blog » Good Process Improvement Practices
Pingback: Respect People by Creating a Climate for Joy in Work » Curious Cat Management Improvement Blog
Pingback: Test Beliefs Against Data | Improve With Me

	Why design of experi… on 25 Great Quotes for Software…
	Test Beliefs Against… on In Praise of Data-Driven Manag…
	Katt Franklin on Defect Seen >10 Million Tim…
	Halperinko - Kobi Ha… on A Fun Presentation on a Powerf…
	David Webb on Why isn’t Software Testi…

Hexawise Blog

Software Testing – Combinatorial Testing

In Praise of Data-Driven Management (AKA “Why You Should be Skeptical of HiPPO’s”)

10 thoughts on “In Praise of Data-Driven Management (AKA “Why You Should be Skeptical of HiPPO’s”)”

Leave a comment Cancel reply

Share this:

Related

10 thoughts on “In Praise of Data-Driven Management (AKA “Why You Should be Skeptical of HiPPO’s”)”

Leave a comment Cancel reply