The Foundation of A/B Testing for Ecommerce Growth

eCommerce A/B testing guide covers everything you need to do in order to get started with A/B testing on your ecommerce store. There are five chapters and chapters sometimes have complimentary documents in order to help you get started quicker. Think of them as cheat sheets and similar.

Chapter 1

Ecommerce Growth: Understanding A/B Testing

Imagine you had a goal to ride a bicycle a hundred miles.

Of course, you’d want to make sure you knew how many miles you’d gone. You’d want to strive for that first 10, that first 20, the huge milestone of 50. You’d want to make sure you kept putting a lot of effort and energy into riding that bike so you could get to 100 eventually.

But you’d also make sure you were using the right wheels. You’d also check to see that your seat was the best kind for you, and that your pedals were the right type. You’d want a professional to look over your chain, brakes, alignment, and everything else that makes your bike run as efficiently as possible.

Because if you’re putting tons of energy into just hitting distance markers, you might be able to hit those goals faster and more easily with a tuned-up bike, right?

We should think about ecommerce the same way. The goal is always revenue, always more miles, but instead of thinking constantly about traffic and the sheer effort it takes to keep growing traffic base, what if we thought about the efficiency of the store at the same time?

Ultimately, there’s two ways to improve the revenue of your ecommerce store:

1. More traffic to your properties
2. More conversions with your existing traffic

We get super focused on growth and traffic, and for early stage e-commerce stores, we encourage that laser focus. Your store flat-out can’t be successful without traffic. Without traffic, improving your conversions won’t accomplish much, and you’ll lack enough data to even run tests.

But once you’re seeing a solid traffic base and you’re wondering what else you can possibly do to improve your numbers, start thinking about your website in a different way. Instead of, “How do I get more people to visit my site?” start thinking, “How do I get more people who are already visiting to take the actions I want?”

We’re talking about the art and science of conversion rate optimization (CRO), of course, and the main mechanism to conduct CRO – A/B testing, or split testing.

At its most basic, A/B testing sets up one change in one variable, and records any difference in your customer behavior to help you determine if you should make the change or not.

So you have the control, which is exactly what you’re already doing, and then the same exact page with one thing changed:

We go a lot more into how to set up effective A/B tests in some of our other posts, but it’s important to keep your tests scientific by only changing one element at a time.

For instance, if you’re trying to figure out if adding an interstitial helps or harms your sales page conversion rate, don’t also change the copy and button colors at the same time. You won’t be able to tell which change caused the uptick or drop in conversions.

It takes some patience, but isolate exactly what you want to improve and change one thing at a time.

Why to Divert Some Attention from Growth Toward CRO

Let’s back up for a minute before we jump further into “What is A/B testing?” and answer “Why A/B testing?”

If growth is so important, why does it make sense to stop focusing solely on growth and start paying attention to CRO?

Because when it comes to getting the best results for your effort, it’s hard to beat the incremental changes and potentially huge gains you can get through optimizing based on testing.

For example, let’s say – as the example below shows – that you have 10,000 visitors to your site during a given period. We’ll call it 10k visitors a month, for simplicity.

So every month, 10,000 people visit your site. Of those 10k, 60% visit your shopping area, 30% place an item in their cart and 3% make a purchase, leaving you with 54 conversions, or customers who bought.

Imagine that through some small tweaks like adjusting your sales messaging or shopping cart process, you could increase all of those numbers by 2% at each stage.

Now, 62% of visitors make it into the shopping area. 32% of them put something in their cart. 5% buy it. Instead of 54 purchases, you now have 99, or almost double!

And if you could replicate those small gains across every important area of your web property, you’ve doubled your conversions without making any effort to grow your traffic.

That’s why testing matters. You could carry on running your store as usual, working on growing your traffic, and making changes to your website based on your intuition or what you’ve seen other stores pull off successfully – but nothing will work as well as honing in on what you should be doing based on durable testing.

Difference can be visualized like this:

A strong e-commerce operation may still get some conversion jumps without testing, but a statistically strong and properly set up testing program makes that line follow a much more exponential curve, as successful tests build on successful tests.

The Four Building Blocks of A/B Testing

If you’re ready to add optimization to your toolkit for growing your e-commerce business, that’s excellent. Let’s talk broadly about the things you’ll need to succeed.

Testing mindsetThis may feel a bit fuzzy, but it’s the most important thing to arm yourself with before you start working on your CRO. At Objeqt, clients come to us at a variety of success levels and for a variety of reasons, but the first thing we work on is testing mindset and approach to testing.

Here are the things we believe about testing:

  • Testing produces the most sustainable lifts of any tactic you could employ
  • Testing should be done consistently and continuously
  • Testing requires patience to generate the best results
  • A successful testing program should combine short-term successes and long-term goals
  • The entire organization should adopt a testing culture where data- and research-driven testing leads to actionable changes
  • Testing significantly minimizes an organization’s risk exposure
  • Conversion improvement is as or more important than traffic for revenue growth

We talk about all those principles in detail in other posts, but those ideas form the basis of our testing philosophy.

If you’re new to testing, or if you’re thinking of testing as a quick and easy “hack” aimed solely at boosting revenue, we’d encourage you to carefully consider your longer-term goals.

Testing requires patience, and your testing philosophy should be geared toward discovering the right adjustments to make to deliver a better customer experience and sustainable gains for your store – not a quick boost and then on to the next thing.

Traffic for statistically sound tests
As a wise person once said (probably): There’s no replacement for traffic. Get it or you can’t test.

It might feel arbitrary and exclusionary, but there’s a reason behind this idea. If you run split tests with too few visitors, your test results might not be statistically sound.

For instance, let’s say you’re running a test about changing your call-to-action button on a landing page. You only have 10 visitors to that page, and 4 pick red and 6 pick green.

On its surface, 60% for green seems pretty conclusive, but when it’s only a difference of two visitors, it’s hard to justify changing all your buttons to green with any confidence. You might run the same test another day and the results could be swapped because there’s just not enough traffic to test.

The Kissmetrics sample size calculator tells us that this test isn’t statistically significant:

There are a lot of automatic calculators out there to tell you how many conversions you need to run a statistically sound test, including Optimizely, VWO and our personal favorite from Evan Miller.

Each calculator runs a bit differently. If you have any questions about your model or what a calculator is telling you, we’ll always recommend working with an expert contractor. You get the double benefit of putting a specialist in charge of your testing, and you get to go back to what you do best – leading your e-commerce business.

Quantitative research (analytics)
Analytics needs to come before testing.

This can sometimes be a hard sell for store owners who already have enough traffic to begin a testing program – if that’s the case, most of the time the business is doing well and has had some success with traffic growth tactics.

It’s hard to say, “OK, now it’s time to properly set up a system for accurate quantitative research and analytics data gathering, and only then will it make sense to test!”

But this goes back to that crucial testing mindset – why test if you have a weak research foundation that ensures your test results will be useless? Or worse, dangerously incorrect?

We have a few posts that go into way more depth on how to properly set up an analytics program, including this one that we recommend starting with.

Qualitative research (customer research)

Analytics forms only one part of a solid research program. As we like to tell our clients, you can’t improve what you can’t measure – and data only tells you so much.

Complementing your analytics with a comprehensive customer research program yields the kinds of customer insights you’d only dream about otherwise. It’s crucial to try to understand your customer deeply and predict what will be important to them as they experience your website – before you start testing.

You have an infinite number of things you can test, but not all of them will make a difference to your customers. So you can use this deep research to hone in on the important things and only spend your time testing those.

The Four Building Blocks of A/B Testing

So now your bike is as tuned up as it’s ever going to be, and you’ve also put in all the effort growing your stamina and physical ability – you’re ready to cycle a hundred miles. Congratulations!

But bike technology changes. Parts wear down. Things fall out of alignment. Your needs change. If you’re ready to go your next hundred miles, you’ll need to continuously tune up your machine, right?

And ecommerce stores are the same way. Testing helps store owners establish a proven process and system for optimizing the most important aspects of the customer experience on a continuous basis.

Chapter 2

You Can’t Improve What You Can’t Measure: Why A/B Testing Starts With Analytics

The most common question we get from ecommerce shop owners is “When can I expect to see results from our A/B testing?”

It’s a natural question. You’ve decided to invest in the more sustainable and long-term gains you can get through a conversion rate optimization (CRO) program based on A/B testing. You’re looking beyond just traffic as a revenue growth track, and you’re ready to start seeing results from testing – now.

And we understand – we’re in it for your results, too. We also want to see you get the best possible results from taking the time to set up a strong testing program and create revenue gains through deliberate experience design changes. We balance a desire for long-term revenue growth with short-term “quick wins” to build a pervasive testing culture.

Those are the exact reasons why we’re completely honest with most people and tell them: we don’t know when you’ll see results.

The exact amount of time it takes to see results depends on a lot of factors. Oftentimes, ecommerce businesses need to set up the foundations of a testing process from the ground up. It depends on what your goals are, and what “results” means to you.

Most of the time, when we first start working with a client, we don’t start with testing at all. We start with understanding your business and the first step in that process is analytics.

The Relationship Between Analytics, Testing and CRO – or “The Gym Analogy”

Our favorite super-simple definition of CRO comes from the folks at Qualaroo. CRO is (emphasis ours):

…the method of using analytics and user feedback to improve the performance of your website.

And then they simplify it even further. CRO is:

…finding why visitors aren’t converting and fixing it.

We love how both get straight to the point, but we highlighted the words “analytics” and “user feedback” in the first definition because those two things are the key to why some organizations succeed at testing and CRO – and some don’t.

CRO is about figuring out where to close those holes in your funnel and improve your visitor experience. You can figure out what you need to do through targeted A/B testing.

But if you can’t trust the numbers underpinning the whole enterprise are accurate…what’s the point? That’s where analytics come in.

A properly set up analytics program provides the accurate data you need to support a testing and optimization strategy. It forms the foundational bedrock of the whole process.

Let’s use what we call “the gym analogy.”

You may see this pop up throughout our blogs because it’s such a useful analogy for understanding the process of testing and CRO.

You hire a personal trainer because you want to be able to lift 300 pounds. On the first day, you ask, “When will I be able to lift 300 pounds?

If you asked, “When will I be able to lift 300 pounds?” your personal trainer would never say “Five months.”

Your personal trainer would probably explain that the ability to lift 300 pounds comes from a variety of factors, including how often you’re going to work out, your nutrition level, the types of exercises you’re doing, your existing strength level, your musculature and more. They would never just say, “Five months.”

As you begin your workout regime, you’ll see a lot of “small wins” on the way to 300 pounds.One day, you’ll be able to lift 100. Another day, you’ll notice you’ve lost weight and gained muscle. And eventually, through meeting a number of small goals, you’ll be able to lift 300 pounds.

In our analogy, your workout routine and all those great exercise habits represent testing. And the foundational thing you need to lift weights? Weights, and weights that you can trust actually weigh what they say. That’s the analytics part of the analogy – data you can trust to form the foundation of your system.

What Makes a “Proper” Analytics Setup

So what makes an analytics setup trustworthy and accurate? There are a few major elements:
1. The Right Tools
You’re only as good as your tools. We’re big fans of industry standard Google Analytics, and here’s why:

  • It’s free. This is a major one. We’re respectful of our clients’ budgets and a “free, powerful enough” tool always beats a “paid, slightly-more-powerful tool” in our opinion. The fact that anyone can sign up for Google Analytics without having to worry about pricing themselves out of their own testing process gives it a major thumbs up in our book.
  • It’s an industry standard. A lot of people in CRO and ecommerce use Google Analytics. This means once you’re set up with Google Analytics, you have lots of options. There are a ton of resources, so you can work with it yourself. Or you can work with a consultant – and most professional testing and CRO consultants should know their stuff in Google Analytics.
  • It’s easy to use and scale. A lot of tools require a lot of study before you can move past the simple early metrics and really get into the meaty stuff. Not so with Google Analytics. Whether you’re looking at very standard KPIs or really diving in, the tool doesn’t get in your way or require hours of tutorials to become proficient at setting up experiments like this one:

We also like FullStory, or HotJar, the useful tools that tell you about your users’ click and scroll habits using heat maps. In essence, FullStory and other tools track where your visitors click, where they start or stop scrolling, and where they linger, through heat maps like this one:

These types of tools provide useful information for understanding customer behavior. All the tools can show the data in multiple types of heat maps, such as:

  • Click maps, that show where your visitors click on your website
  • Hover maps, that track the movements of the mouse pointer
  • Scroll maps, that track the vertical movement of the screen as visitors read your content

Of course, their usefulness does not stop there. Every one of these tools features session recorder, where you can watch the interaction of each individual visitor. Their creators have included a few more options, such as form and funnel tracking or extremely useful session recording. HotJar also offers possibility to create surveys and polls.

Of course, everything these tools do can also be achieved using Google Analytics event tracking in conjunction with Google Tag Manager and Page Analytics. However using heatmaps and session recordings adds an important visual dimension that may help gain insights faster.

2. The Right KPIs
The right suite of analytics tools get you started on the right track, but next you have to set up what you’re measuring with those tools.

The things you can measure are almost infinite, and which ones you select will depend on your ultimate goals, but here’s a list of some you can consider to start with:

  • Cart abandonment rate
  • Cart abandonment point (where in the process)
  • Traffic to sales pages
  • Unique visitors
  • Returning visitors
  • Scroll depth
  • Length of time on page

Once you’ve determined what you’re going to measure, you can set up your tools and platforms to deliver you data on a regular basis.

3. The Right AnalysisOf course, having the data isn’t the end-all, be-all either. You have to know what to do with the data and how to analyze it to see what it’s telling you.

For instance, if you’re looking at your cart abandonment data and you’re seeing a lot of dropoffs right after you ask customers to fill out a form, you could start thinking of tests designed to find out why. Is your form too long? Does it happen too early in the process? Do you require too many fields? All of this stems from being able to read your data.

The Other Side of Research

Of course, data can only tell you so much. For the best possible research-backed foundation for your testing program, you’ll need to talk to your customers.

How to ask your customers the right questions to get information to complement your data is a topic for another day, but it’s important to remember: once your analytics are set up, your job isn’t done.

Chapter 3

Know Thy Customer: Why You Need Qualitative Research for Effective CRO

A while ago when we were still consulting in ecommerce, we started working with an ecommerce client who wanted us to help optimize their checkout page for conversions.

On the surface, they had everything they thought they needed to A/B test: $5 million in annual recurring revenue stemming from high traffic, plus a solid analytics program and the ability to analyze that data.

They’d been running some tests on their own and doing a good job, carefully measuring results and making sure the conversion sample sizes were large enough to be statistically significant.

When we started working with them, though, they couldn’t solve the mystery of their checkout page.

They’d tested a lot of things – colors on the checkout page, the placement of various elements, etc., and they couldn’t figure out why their data told them people abandoned their checkout process on that page at an alarming rate.

We looked at the tests – they were solid.

We looked at the data and analytics – also solid.

Then we asked, “What do your customers say?”

They hadn’t thought to ask. So we helped them ask.

As it turned out, their trust elements were scaring people away.

The company, proud of its commitment to security, had recently added, “We are now using Komodo, the best security there is,” to their “verified by Visa” and SSL certificate info – and it was scaring people away.

“Why would they need the best security? It seemed like maybe there was a reason, like something bad had happened in the past,” one customer told us. Others echoed those sentiments.

We ran the tests based on that idea, and they confirmed – the line about Komodo had to go. We ended up replacing it with just the logo mark instead.

And then we helped the company set up a robust customer research program, the qualitative complement to their already successful quantitative analytics program.

There were two lessons at play here that we see a lot in testing and conversion rate optimization:

1. Challenge your assumptions through testing, and
2.Put your data into human context, always.

The human brain is only knowable to a point.We can spend forever crunching datasets and watching people interact with our pages and products, and we still might not have the faintest inkling of what’s going on in their minds as they consider whether or not to buy our product.

Data can tell you a lot, but to get context, you need to talk to people.

How to Approach Qualitative Customer Research

The category “customer research” encompasses a lot of things, but for e-commerce stores, we prefer to think of four major ways you can approach it:

Calling and interviewing customers

This is the most straightforward method and the one that’s been around the longest. Marketers and sales professionals have been utilizing customer calls for decades.

Most often, you’ll do this by recruiting people from your website or because they’re existing customers. That can be done through some kind of automated form or during the follow-up phase via email.

Many e-commerce operators will also compensate the customer for their time with a discount or some other offer. We’d recommend trying a friendly, non-incentivized approach first, but understand that a lot of times incentives can help you get more calls confirmed and spend less time trying to get people to agree.

Regardless of how you recruit or incentivize the customer, you should move the conversation to the phone. Email surveys can be a good tool (see below), but if you’re looking for a candid conversation about your product, an in-person conversation works best. You’ll be able to ask follow-up questions and hear their tone.

You can also supplement phone calls with on-page chat apps like Intercom or Drift. These won’t replace voice or face-to-face interviews, but they can lower the barrier to having a conversation with a real customer.

Tracking interactions

Some of these tactics will overlap with quantitative analytics and usability testing a bit, but tracking interactions gives you the ability to consider your customer’s actions in the context of your real page.

Rather than having to look at a datastream and figure out what actions the numbers represent, tracking the way customers interact with elements of your page shows you exactly what actions they take – and don’t take.

This can include things like heat maps, analytics, scroll mapping, referral tracking and more.

Surveying customers

This one’s easy to lump in with interviewing, but surveying is a lower-barrier way to get customer feedback while still providing you useful information. If you’re having trouble getting enough people to agree to phone calls or other in-person interviews, surveying can fill that gap.

You should take their results separately from your interviews, however, as there won’t be as much nuance since you can’t ask follow-up questions or clarify answers on the spot.

To gather surveys, you can implement a user form on your website – you can even call it your store’s annual survey – and then ask questions about customer demographics, how they use your product, and what they want from your product and experience on your site. The goal here is to understand how your product fits into the context of their lives.

Usability testing

This type of customer research offers you “live” tracking – rather than relying solely on something like a heat map, which shows aggregated activity after the fact, usability testing lets you watch users interacting with your website in real time.

This comes in handy when you’re trying to understand how your customers physically interact with certain elements on the page. It won’t necessarily tell you why, but watching users can give you a lot of insight. It’s especially useful for multi-stage operations like onboarding or checkout sequences, or to see if elements of your page are as intuitive as you think they are.

(It’s worth mentioning that sites like UserTesting ask the tester to narrate their thoughts as they’re navigating your page, giving you insight into actions and thoughts.)

Best Practices for Surveying

Just get started

Sometimes the hardest thing is getting started. We get that, so our advice if you’re thinking about implementing a customer research program is to just start. Try one thing first, just one campaign.

If you have an email list (and you really, really should), then start there. Examine what areas of your e-commerce site you’d like to work on optimizing first, and create a brief survey asking about customers’ experiences in that realm. Then send it to your email list with a friendly introduction asking them to do you a favor.

You can expand to including that survey on your website, and then move toward asking questions through chat. Starting to add a customer research component to your CRO doesn’t mean you have to sign up for all the fanciest tools right away – just ask your customers about your product and their experience on your site.

Craft the right discovery questions

You wouldn’t ask a personal trainer about stock tips or a stockbroker about a workout routine – don’t ask your different audience segments the same questions.

Generally speaking, you have three main types of people you want to reach through your customer research efforts:

  1. Qualified nos
  2. Customers who bought moments ago
  3. Existing customers

And in general, you’d like to find out four things from each of those groups:

  1. Uncover where customers come from
  2. Discover appeals
  3. Understand reservations
  4. Understand position relative to your competitors

And in general, you’d like to find out four things from each of those groups:

  • Why did you choose us?
  • What do you use us for?
  • What value have you gotten out of it lately?
  • What new things would you like to see?
  • Are there any aspects to our products or shopping experience that you find frustrating, or which you’d be likely to change?
  • What’s the one thing that nearly stopped you buying from us?
  • What was your biggest fear or concern about using us?
  • What was your biggest challenge, frustration or problem in finding the right product online?
  • Where exactly did you first find out about us?
  • What persuaded you to purchase from us?
  • Please list the top three things that persuaded you to use us rather than a competitor.
  • Which other options did you consider before choosing our product?
  • Which of our competitors, both online and offline, did you consider before choosing our product?
  • How were you recommended to us?
  • Did you take a look at any of our competitors?
  • On a scale from 0 to 10, how likely are you to recommend us to a friend or colleague?

Combining Quantitative and Qualitative Research to Identify Testing Opportunities

Of course, asking people what they’re thinking, even in the most targeted and unbiased of ways, doesn’t always get you information you can use.

People might lie. They might not understand a subconscious thought process. They might misremember. They might tell you what they think you want to hear.

Humans are, well, human.

So relying solely on customer research isn’t a good idea either. The best foundational base combines insights from your analytics and customer research to give you more three-dimensional insights.

We always call this the “foundation” of your testing process, because taken together, your qualitative and quantitative data can tell you what to test.

There’s an infinite number of things you can test on any page at any time. Remember our client from the beginning of the article, the ones wondering what on earth was wrong with their checkout page? They tested so many things, but only the combined analytics and customer research data pointed them toward the right thing.

If you spend the time and make the effort to build out a reliable analytics program and a customer research program, you can rest assured that the tests you develop and run will provide actionable results.

Chapter 4

How to Create Effective A/B Tests from Scratch

So you want to create effective A/B tests? Did you ever compete in a science fair as a kid? You came up with a hypothesis – a plant grows much better in direct sunlight, for instance – and then created a scientifically sound experiment to test your theory.

You made sure to control for only what you were testing – sunshine – by using all the same types of plants and soil, and the exact same amount of water in every experiment. You grew your plants in various amounts of sunlight and recorded the results. When it was done, you created a three-sided board to display your results and analysis.

The controlled-variable science experiment analogy applies perfectly to creating effective A/B tests for e-commerce stores.

Of course, you’re dealing with way more variables and ambient noise than a kid growing plants for school, but the principles stay the same.

Step 1: Where to Begin Testing

We constantly emphasize to our clients that effective testing is a long-term game. It’s not always a popular stance in the era of “move fast and break things,” but the whole point of testing is so that you move in only the right directions and don’t waste time or money breaking big things that matter.

Effective testing begins with you.

Testing begins with your mindset, your resources, your e-commerce store – everything that’s available to you and unique to you.

Strengthening your understanding of the testing process and integrating testing into your team’s process gives you ability to reel off successful tests consistently – thus raising your revenue through improved conversions.

Testing is a way to build up a systematic approach to optimizing conversions across your entire organization. It’s not something you just “start doing” or “implement” overnight. Like our favorite weightlifting analogy, it’s a process. No one walks into the gym and lifts 300 pounds on their first day. First you need to learn how to lift properly and how to fuel your body for effective lifting.

Step 2: Figuring Out What to Test

Since you’re not jumping in and lifting 300 pounds on day one, you’re also not rolling up your sleeves and testing a complete redesign of your homepage on day one.

Even if you already have a little testing experience, we usually recommend against our clients starting with full-page redesigns. If you have no testing experience at all, definitely start with something smaller.

It’s important to start with a balance of something that gives you a true lift but also won’t take forever to set up and run. You’ll want to be able to work quickly through the process of setting up, monitoring and analyzing the test results with your team and testing agency.

Getting through some tests early on also shows you right away any flaws you might have in your testing approach and how to fix them before you start testing something huge, like crucial design elements on your highly-trafficked homepage, for instance.

Here are some ideas on tests you can set up:

  • Customer flow through your checkout, removing clutter
  • Copywriting, or specific words you use in headlines, buttons and more
  • Use of social proof
  • Use of security indicators
  • Colors
  • Navigation and search improvements
  • Videos, animation or none
  • Calls to action elements, including colors, copy, and location
  • Calls to Fonts and text sizeaction elements, including colors, copy, and location
  • Identifying and removing distractions from pages

It’s also worth mentioning here that you should be sure your testing tool is capable of and configured to measure the right things – and that you’ve chosen the right measurements for the test you’re running.

For instance, if you’re testing the effectiveness of two landing page headlines and you’re measuring conversions as “clicks on the CTA button,” your results could be skewed by visitors’ reactions to the CTA button itself, which wouldn’t have anything to do with the headlines.

This example from Copyhackers shows this issue in action. Although the instance on the right saw an almost 124% higher conversion rate, it’s hard to know whether to attribute the lift to the page copywriting or the button CTA copy changes:

Keep your experiments single-variable (just like in your school science experiments) by changing only one thing at a time.

Step 3: Setting Up a Statistically Significant Test

Perhaps the hardest part of A/B testing is making sure your test will mean something.

By that, we mean making sure that when you’ve gone through all the work of setting up, running and analyzing your test, you need to know those results mean what you think they mean.

For instance, if you run a test trying to figure out what color to make your homepage button, but you only get 20 visits, and 15 think it should be blue, while only 5 think it should be green, you could conclude that it should definitely be blue…except that’s not a lot of visits, and what if the next 7 people pick green? Your results would feel way less conclusive.

The issue here is a sample size that’s too small. A lot of sample size calculators exist for figuring out how to run a statistically sound and significant test, but many inexperienced testers fall prey to the illusion that they can enter in their parameters and the calculator will spit out perfect, exact answers to the number of observations or length of test necessary.

You need at least 1,000 transactions – or around 25,000 uniques – in order for any kind of testing to make sense.

Testing guru Evan Miller summed up this fallacy and issue so perfectly that we couldn’t say it better:

When an A/B testing dashboard says there is a “95% chance of beating original” or “90% probability of statistical significance,” it’s asking the following question: Assuming there is no underlying difference between A and B, how often will we see a difference like we do in the data just by chance? The answer to that question is called the significance level, and “statistically significant results” mean that the significance level is low, e.g. 5% or 1%. Dashboards usually take the complement of this (e.g. 95% or 99%) and report it as a “chance of beating the original” or something like that.However, the significance calculation makes a critical assumption that you have probably violated without even realizing it: that the sample size was fixed in advance. If instead of deciding ahead of time, “this experiment will collect exactly 1,000 observations,” you say, “we’ll run it until we see a significant difference,” all the reported significance levels become meaningless. This result is completely counterintuitive and all the A/B testing packages out there ignore it, but I’ll try to explain the source of the problem with a simple example.

We generally recommend using Evan Miller’s A/B test sample size calculator to help avoid this issue, and to make sure you’re keeping an eye on statistical significance at all times as you set up your tests.

Step 4: Monitoring Your Test – Patiently!

This step may actually be the toughest part of testing, because we’re wired by human nature to want to peek at our ongoing tests.

The advice not to peek at your tests is everywhere, and sometimes our clients will assume it’s because they’ll somehow ruin the test by looking early.

Here’s the secret: nothing will happen if you peek at your test early.

The reason a lot of testing consultants and experts will warn you against it is to protect you from yourself. It’s almost impossible to resist acting based on what you see when you look early.Even if you try not to, you’ll have that information in the back of your mind. It’s best to just not look!

Step 5: Running Your Tests the Right Duration

This ties in with “Step 3: Setting Up a Statistically Significant Test” because when you stop a test depends a lot on when you’ve experienced enough observations of an action to make your test statistically meaningful.

For instance, if you calculate that you need a certain number of observations to reach a sound test, then you can stop your experiment after you’ve reached that number.

This reinforces the need to calculate your sample sizes based on significance and not to simply run your experiments until you see significant change.

Here’s an example from Evan Miller showing why stopping the experiment in the latter situation is problematic:

Example A: Experiment run until a statistically significant number of observations reached

Example B: Experiment run until statistically significant difference in instances of the observations

As you can see in the second example, in two scenarios the test was stopped too early. This gives you completely skewed results that would over-emphasize the percentage change of conversions – probably leading you to make a change to the thing you’re testing, potentially to your detriment.

Step 6: Analyzing the Results Properly

Your first action when you reach this step? Check your results. Then check them again. And maybe again, for good measure. (Ask our founder Emir about the time he saw a very popular testing software serving the control treatment to 100% of a client’s visitors while the software showed an active test running the entire time.)

Fancy software and shiny calculators can tell us a lot, and they make testing a lot easier than in the olden days, but because we’re optimizing for humans, we need to always give everything a human eye.

We recommend to calculate and validate results using chi-squared tests like this one:

If you’re relying solely on software like VWO and Optimizely, then you should know those two tools have hidden all of the information about their actual analyses – presumably to keep their methodologies secret as they engage in a shootout for testing supremacy.

We’re not saying you need to doubt all results from VWO or Optimizely, but it does become impossible to independently verify your results when the actual analysis is hidden behind an opaque black box.

Remember how our overall testing method combines quantitative analytics and qualitative user research before we reach the testing stage? We do that so we always have a human perspective on data – and so we have cold, hard data to counteract human biases. This complementary relationship between data and your creativity and instincts also applies when analyzing your test results.

Your New Motto: Move Deliberately and Improve Things

In ecommerce and business in general these days, we constantly hear the refrain “Move fast and break things.”

We understand the sentiment behind it – this mindset encourages you to experiment and try things, put ideas into action as quickly as possible, and move on to other things if the actions fail to pan out fast. It helps organizations avoid stagnation or endless planning loops without action.

The problem, though, is it also encourages hasty testing and iteration based on potentially incomplete results.

Testing requires patience, but it doesn’t mean stagnation. Applying science rather than gut instinct all the time results in making changes proven to result in higher conversions – and thus, higher revenue.

Creating a testing system and integrating a testing mindset and process into everything you do will help you create sustainable gains across the board. Rather than running headlong without a destination in mind, you’ll move steadily and inexorably toward your goals – and those revenue gains.

Chapter 5

Building a Sustainable Testing Culture

Which is more dangerous to your e-commerce store: your data or your intuition?

Answer: If you’re using one without the other, they could both be disastrous.

The case for intuition without data being more dangerous:

A few years ago, a study of 800 Fortune 1000 marketers to review their relationship with data revealed marketers used data to make customer-related decisions on 11 percent of the time. The other 89 percent, they relied on intuition.

Without data, intuition could lead you anywhere. You might blow up a highly successful onboarding process on a hunch. You might change your checkout flow to solve a cart abandonment issue, when all you needed to do was add a few confidence factors. It’s the epitome of flying blind, no matter how good you think your gut might be.

The case for over-reliance on data being more dangerous:

In the same study, marketers revealed on the opposite end of the spectrum, 11 percent of the time they “couldn’t get enough” of the massive data streams we have access to now. They used data for every decision – and made decisions based on data too often. Without a proper testing structure, iterating based on every blip or bump in your data can be just as dangerous as flying blind with only your intuition to guide you.

The truth is, the best way to make decisions for your e-commerce company is through combining data and intuition – and creating a sustainable testing culture to help grow your revenue.

More e-commerce companies should make a commitment to creating a culture of testing. We’re talking about a comprehensive, end-to-end approach to qualitative and quantitative information, functional design and effective split testing, a commitment that stretches from your C-suite to your interns. When everyone in your company approaches problems with a testing mindset, you’ll have a solid foundation for growth.

Why (a) Testing (Culture) Matters

In an excellent Harvard Business Review article, Dan Ariely notes that companies want to give weight to intuition or expert opinion because that’s what we’re conditioned to do: drive toward answers.

“When we pay consultants, we get an answer from them and not a list of experiments to conduct. We tend to value answers over questions because answers allow us to take action, while questions mean that we need to keep thinking. Never mind that asking good questions and gathering evidence usually guides us to better answers.” – Dan Ariely, HBR

When confronted with more questions than answers, or answers you can’t be sure of, testing is the risk reduction process you’re looking for.

Even if you’re not testing, you’re going to make changes to your business. That’s the nature of business, after all. You’ll change your web properties, your app, your messaging, your checkout process – you’ll change all of that at some point, because you understand that businesses can’t stagnate. Things move too fast. You need to iterate. You need to stay ahead of complacency.

But what are you changing? And why? Are you relying on intuition without data? Or data without context?

Creating a testing culture that prioritizes optimization should be a natural stage for serious companies.

Step 2: Figuring Out What to Test

Since you’re not jumping in and lifting 300 pounds on day one, you’re also not rolling up your sleeves and testing a complete redesign of your homepage on day one.

Even if you already have a little testing experience, we usually recommend against our clients starting with full-page redesigns. If you have no testing experience at all, definitely start with something smaller.

It’s important to start with a balance of something that gives you a true lift but also won’t take forever to set up and run. You’ll want to be able to work quickly through the process of setting up, monitoring and analyzing the test results with your team and testing agency.

Getting through some tests early on also shows you right away any flaws you might have in your testing approach and how to fix them before you start testing something huge, like crucial design elements on your highly-trafficked homepage, for instance.

Here are some ideas on tests you can set up:

Organizations need a testing culture and a testing mindset because a testing culture:

  • Removes the risk of changing the wrong things and moving your business in the wrong direction.
  • Ensures you always know what your customer wants through qualitative testing.
  • Gives you a way to contextualize your quantitative data to make real analysis.
  • Helps you become so practiced at the cycle of hypothesizing, testing, adjusting, analyzing and iterating that when you need to make major changes and conduct major tests, you can trust your process and your results.
  • Instills a “challenge everything” mindset in your team, which means no lazy conclusions or wild guesses make it into your business decisions.

Before you start emphasizing optimization and creating your testing culture, make sure you fully understand these guiding principles:

1. Testing isn’t about proving yourself rightIntuition has its place, but you shouldn’t go into your tests with the intention of confirming or disproving your hunches. Keep your agenda clear. Your results and analysis can show you a concrete way forward when you have a lot of options, but be wary of influencing your analysis with preconceived notions.

2. Start at the beginning as recommended by Optimizely, you’ll want to build testing into everything you do that could involve making a lot of assumptions, especially early on. Optimizely uses the example of your information architecture and content strategy, but this could be applied across a lot of areas in an e-commerce operation.

3. Testing belongs to everyone – encourage collaborationBy design, testing should remove the rule of what ConversionXL’s Peep Laja calls a HiPPO (highest-paid person’s opinion). A testing culture should encourage team members to work together to form tests and give input on analysis.

For instance, a test run by the marketing automation team might turn up results that your customer success team would find extremely valuable, or your sales team might have valuable input for a copywriting split test based on what they know about sales-qualified lead behavior. If your team tests in silos, none of that information would be shared.

Part of collaboration is keeping everyone informed. Transparency about who’s doing what can help avoid duplicate work or incomplete analysis. You can use something like this testing roadmap, similar to a product roadmap, to keep everyone abreast of what experiments are going on.

4. Some tests should be about short-term wins
Not every test measures something earth-shattering. Part of a testing culture is infusing the testing ideology into everything you do, so that means sometimes you’ll be testing something small, like an inline CTA v. an image-based CTA. But “smaller” tests don’t need to result in small results – if an experiment over those CTAs gains you one percent more conversions, then you’ve gained a lot for little effort.

Balancing quick wins with longer-term “major” tests (like full homepage tests) also means your team gets more testing experience and can stay nimble to respond to unanticipated changes in your customer base.

5. Embrace functional design
Everything you do to any of your customer-facing properties – including your website and apps – will affect your customer’s experience. Functional design involves finding the balance between optimized design and aesthetically pleasing design. All elements should have a function and exist for a reason, and testing helps you find that reason. Combining the aesthetic sensibilities of design with the functional practicality of CRO helps you create e-commerce experiences that are both enjoyable and optimized.

6. Nothing is “done”
Testing should beget more tests. As we quoted Dan Ariely saying before, “good questions and gathering evidence guides us to better answers.” The more good questions we can pose around the “why” and “how” of our business decisions, the better our answers will become.

Maintain an attitude of cycles. It’s all too easy for to complete a big project like a website redesign, call it “done” and not revisit it again for years. If you maintain the attitude that nothing is “done” and everything needs to be tested on a schedule, you’ll avoid becoming outdated or drifting away from what your customers want.

Focusing on the Long Run

Testing takes time, but the results are worth it. Applying testing to business decisions results in incremental but sustainable gains (Forbes found as much as 241 percent ROI). “Move fast and break things” results in a string of fast moves and broken things, but not necessarily a clearer picture of where you should go and what you should be focused on.

A testing culture breeds the sort of sustainable momentum that builds lasting companies. Employees who ask “why” before “how.” Processes designed to eliminate waste and promote efficiency. A company focused on long-term goals rather than short-term puffs of smoke.

In an industry obsessed with the latest and greatest, companies dedicated to meticulously building testing into their culture can experience the kinds of results others can only wish to see. Companies built on testing culture will build what its customers want, and their customers will respond accordingly.