Shopping experience
,
All
,
6
Min

How to Create Effective A/B Tests In 6 Steps

Effective testing is a long-term game.The whole point of testing is so that you move in only the right directions and don’t waste time or money breaking big things that matter. Testing begins with your mindset, your resources, your e-commerce store – everything that’s available to you and unique to you.

Bonus Material: <cc-checklist>50 Questions to Ask in Customer Surveys<cc-checklist>

Did you ever compete in a science fair as a kid? 

You came up with a hypothesis – a plant grows much better in direct sunlight, for instance – and then created an experiment to test your theory. 

You made sure to control for only what you were testing – sunshine – by using all the same types of plants and soil, and the exact same amount of water in every experiment. 

You grew your plants in various amounts of sunlight and recorded the results. When it was done, you created a three-sided board to display your results and analysis.

Congratulations – you have already conducted effective A/B tests.

And if you haven't competed in a science fair, no worries. We're sure you're still capable of running successful tests for your ecommerce stores.

Customers are slightly more complex than plants, but the principles stay the same. 

Step 1: Where to Begin Testing

Effective testing is a long-term game.

The whole point of testing is so that you move in only the right directions and don’t waste time or money breaking big things that matter. 

Testing begins with your mindset, your resources, your e-commerce store – everything that’s available to you and unique to you.

Strengthening your understanding of the testing process and integrating testing into your team’s process gives you the ability to reel off successful tests consistently.

And that will raise your revenue through improved conversions. 

Testing is a way to build up a systematic approach to optimizing conversions across your entire organization. It’s not an overnight strategy.


Step 2: Figuring Out What to Test

Even if you already have some testing experience, hold your full-page redesigns (like a complete redesign of your homepage on day one). If you have no testing experience at all, definitely start with something smaller.

It’s important to start with a balance of something that gives you a true lift but also won’t take forever to set up and run. 

You’ll want to be able to work quickly through the process of setting up, monitoring, and analyzing the test results with your team and testing agency.

Getting through some tests early on also shows you right away any flaws you might have in your testing approach and how to fix them before you start testing something huge.

Here are some simple ideas on tests you can set up:

  • Customer flow through your checkout, removing clutter
  • Copywriting, or specific words you use in headlines, buttons and more
  • Use of social proof
  • Use of security indicators
  • Colors
  • Navigation and search improvements
  • Videos, animation or none
  • Calls to action elements, including colors, copy, and location
  • Fonts and text size
  • Action elements, including colors, copy, and location
  • Identifying and removing distractions from pages

Ensure our testing tool is capable of and configured to measure the right things. And confirm that you’ve chosen the right measurements for the test you’re running.

Step 3: How To Set a Statistically Significant Test

We've mentioned how important it is to create tests that are meaningful.

And the hardest part of A/B testing is making sure your test will mean something. You need to know those results mean what you think they mean.

When running a test to figuring out what color to make your homepage button, let's say you get 20 visits: 15 people think it should be blue, while only 5 think it should be green.

You could conclude that it should definitely be blue. But wait! That's not a sufficient sample to run your tests through. What if the next 7 people pick green? 

‍You need at least 1,000 transactions – or around 25,000 uniques – in order for any kind of testing to make sense.

‍Testing guru Evan Miller summed up this:

When an A/B testing dashboard says there is a “95% chance of beating original” or “90% probability of statistical significance,” it’s asking the following question: Assuming there is no underlying difference between A and B, how often will we see a difference like we do in the data just by chance? 

The answer to that question is called the significance level, and “statistically significant results” mean that the significance level is low, e.g. 5% or 1%. 

Dashboards usually take the complement of this (e.g. 95% or 99%) and report it as a “chance of beating the original” or something like that. However, the significance calculation makes a critical assumption that you have probably violated without even realizing it: that the sample size was fixed in advance. If instead of deciding ahead of time, “this experiment will collect exactly 1,000 observations,” you say, “we’ll run it until we see a significant difference,” all the reported significance levels become meaningless. 

We recommend using Evan Miller’s A/B test sample size calculator to avoid this issue. Keep an eye on statistical significance at all times!

Step 4: How To (Patiently) Monitoring Your Test

Curiosity killed the cat. 

Not because cats are curious, but because every time you peek at ongoing tests, a cat dies. 

Do not check out your tests by looking early!

Truth is, you won't ruin anything. You won't kill cats, either.

But you will be protecting you from yourself.

It’s almost impossible to resist acting when you look early. Even if you try not to, you’ll have that information in the back of your mind. 

It’s best to just not look!

(Think about the cats if that motivates you.)

Step 5: The Right Duration To Run Your Tests

The decision to stop a test depends on when you’ve got enough observations of action to make your test statistically meaningful.

Remember to calculate your sample sizes based on significance. Find out what your ideal sample size is. Don't simply run your experiments until you see significant change. 

As far as irony goes, those changes might not be significant.

Here’s an example from Evan Miller showing why stopping the experiment in the latter situation is problematic:

[image]

‍Example A: Experiment run until a statistically significant number of observations reached

[image]

Example B: Experiment run until statistically significant difference in instances of the observations

[image]

As you can see in the second example, in two scenarios the test was stopped too early. 

This gives you completely skewed results that would over-emphasize the percentage change of conversions. If you made a change based on that, you might not see any results. 

Step 6: How To Properly Analyzing the Results

The first thing you should do when your tests are done? Check your results.

The second? Check them again. 

And maybe again, for good measure. 

(Third time's the charm, after all.)

Fancy software and shiny calculators can tell us a lot, and they make testing a lot easier. However, we're optimizing for humans. 

If you’re relying solely on software like VWO and Optimizely, then you should know those two tools have hidden all of the information about their actual analyses. (Presumably to keep their methodologies secret as they engage in a shootout for testing supremacy.)

We’re not saying you need to doubt all results from VWO or Optimizely, but it does become impossible to independently verify your results when the actual analysis is hidden behind an opaque black box.

Ideally, always have a human perspective on data. And have cold, hard data to counteract human biases.

Remember how our overall testing method combines quantitative analytics and qualitative user research before we reach the testing stage? 

This complementary relationship between data and your creativity and instincts also applies when analyzing your test results.

Your New Motto: Move Deliberately and Improve Things

"Move fast and break things."

Your grandmother would disagree, and so do we.

We understand the sentiment behind it. This mindset encourages you to experiment, and move on to other things if the actions fail. 

It helps organizations avoid stagnation or endless planning loops without action, sure.

It also encourages action based on potentially incomplete results.

Testing requires patience, but it doesn’t mean stagnation. 

Applying science rather than gut instinct all the time results in making changes proven to result in higher conversions. And higher revenue.

Creating a testing system and integrating a testing mindset and process into your ecommerce will help you create sustainable gains.

Rather than running headlong without a destination in mind, you’ll move steadily toward your goals – and those revenue gains.

To quote your grandmother again: slow and steady wins the (testing) race.

<download-cc-checklist>Download 50 Questions to Ask in Customer Surveys NOW<download-cc-checklist>

Join the fastest growing ecommerce marketing platform and start growing your business