Bonus Material: <checklist-2>51 Questions to Help Define User Personas<checklist-2>
Testing is the process that differentiates conversion rate optimization from other website improvement activities. It’s the opportunity to do a real-time test with your actual audience, and get an accurate estimate of how your improvements will perform. This process gives optimizers a great advantage, especially on ecommerce websites.
Since testing and its statistical methods can sound complicated, we’ve dedicated this entire article to testing.
Hypothesis creation phase
Once you complete your research, your next task is to come up with viable hypotheses for improvement.
This sounds easy, but can be deceptive. While “making a website more credible” may sound like a great improvement idea, it’s not enough for a hypothesis.
Hypotheses need to contain the following:
- What the issue is
- How it affects conversion
- How many prospects are affected by it
- What you could do to improve the process
- How much effort is necessary
- What's the improvement potential
Viable and test-worthy hypotheses contain answers to these questions.
Usually, hypotheses are divided between “weak” and “strong”. Weak hypotheses are those that won’t affect many users, or concern minor aspects of the conversion process. Strong hypotheses address major issues and have a potentially large effect in increasing conversions.
Avoid using the probability of success as a criterion for hypothesis creation. If you knew this information, would you really need to test?
Refining ideas into hypotheses
By definition, to create viable hypotheses, you’ll need to have a clear idea of what you are solving and how.
You've gone through your research data and figured out your ecommerce issues.
Sometimes, an issue will have an obvious solution, and all you have to do is fix them. (Think of a “just do it” category.) This is usually the case with technical issues and some heuristic issues.
Sometimes you may identify anssue that has possible solutions, such as “prospects are worried about product quality”. (This is a common ecommerce issue.)
Your research will show how many prospects are affected by this issue, which will give you a wide range of improvement to work on. Next, you need to decide how to solve the issue, and gauge the effort needed to solve it.
There are multiple ways to solve this issue. Your hypothesis should list all of those or their combinations.
You may hypothesize that adding testimonials and social proof relevant to the specific product will help prove its quality to prospects. Another hypothesis might be adding a quality seal or certificate from a widely recognized institution.
Finally, you can opt to include a product video, and prove the quality of the product by presenting its features.
Each hypothesis can look like this:
“45% of prospects are not convinced of the quality of our product, according to the survey we conducted. o solve this issue, we propose adding a quality seal to the product page, which we expect to result in at least ⅓ of the prospects deciding to buy (this estimate is backed by responses to the question “What would make you buy?” in exit surveys). We expect this to increase our sales by 10%, based on the number of existing customers and relative number of survey respondents.”
The next step is to define the effort your solution will take in terms of coding and designing the actual variations.
Pro tip: The more effort they take, the less attractive the hypothesis becomes. Give priority to hypotheses that are easy to test, so you can increase your ROI with minimal effort and investment.
Once you’ve listed all of your hypotheses, you need to decide which to test first.
Let’s briefly examine a few existing models for hypothesis prioritization and how to apply them.
1. PIE Model
This model of hypothesis prioritization ranks hypotheses according to their Potential [for improvement], their Importance, and their Ease.
- “Potential for improvement” means the likelihood that the hypothesis will result in an overall improvement.
- “Importance” refers to the relevance of the issue.
- “Ease” relates to the effort necessary to implement the hypothesis.
In the PIE model, hypotheses are ranked from highest to lowest priority, using a scale of 1 to 10.
The weakness of this model is that the “Potential” part of the equation is often hard or impossible to estimate, and may be defined arbitrarily.
This leads to incorrect prioritization, and possibly solving only minor issues on the assumption that they’ll achieve significant conversion improvements.
2. TIR Model
This model uses Time, Impact, and Resources as its main factors. The ranking is similar to the PIE model, except the scale runs from 1 to 5.
Developed by CRO veteran Bryan Eisenberg, this model is tied to the research model Plan, Measure, Improve, and works best when applied with that model.
3. ICE model
The Impact, Confidence, and Ease model is very similar to PIE, except it uses a confidence factor in place of “potential”. Like PIE, this makes it highly susceptible to subjective opinions and it's potentially risky.
4. PXL model
The PXL model was developed by ConversionXL, one of the leading conversion rate optimization agencies.
This model is the most complex of the four listed here, as it tries to take into account a number of objective and real indicators to rank hypothesis priority. It's complex to use and requires a lot of effort, which can be a downside.
Testing — Here we are at last!
The entire process of conversion optimization leads to this point, which is the most publicized aspect of conversion optimization. Voila: testing!
Basic concepts of statistics
Testing forms the foundation of the conversion optimization process. Without testing, it wouldn’t be possible to try different solutions and select the best-performing one.
However, to conduct testing properly, you need to follow rigorous statistical rules and be aware of them at all times — otherwise, you won’t extract any value from testing.
To be valuable, an experiment must be “significant” in the statistical sense: a variation needs to be tested against the control on a sample large enough to eliminate any reasonable chance of errors. The key points here are sample size, test duration, and significance level.
Let’s briefly define these concepts and their interaction to explain why each is important to testing.
The first concept to understand in statistics is sampling and sample size. The idea of a “sample” is to use a limited and finite number of observations to estimate the average values of an entire set of figures or population.
This definition sounds complicated, but it just means the ability to deduce the traits of a large group by measuring only a few members, and extending the results to the entire group.
The accuracy of any sample depends on several factors, and selecting a proper sample size is important to achieve accurate results. A good example is to estimate if a coin is “fixed” by tossing it. The actual chance of it falling on either side should be 50%.
However, to accurately test this hypothesis, 10 tosses won’t be enough. Let’s say we toss a coin 10 times and get results like this:
Heads, heads, tails, heads, tails, heads, tails, heads, tails, heads
With this result, we can conclude that the chance of the coin landing on chance is 60:40, but that actually is far from the truth. If you toss the coin 100 times, the results are likely to be more along the lines of 50:50, and eventually, at 1,000 tosses, you will get a 50:50 chance. Any other result would mean that the coin is “fixed”.
The same is true of testing variations on a website. Your control page is the baseline, and any improvement must be detected by testing until you have a large enough sample to declare a winner.
The “significance” level is the first thing that determines the correct sample size. It's an indicator of the reliability of your results. Essentially, it denotes the chances of the result you observe not being random if your initial hypothesis is true. Usually, testing relies on a 95% significance by default. Any lower significance percentage risks “finding” an effect that actually does not exist.
The best way to calculate the sample size is to use one of the many online sample size calculators, such as Evan Miller’s sample size calculator. Use a sample size calculator to determine whether you can test at all!
Sample size also depends on a “minimum detectable effect”. This is the notion of how big an improvement you expect to achieve. You should be aware that if you’re expecting to track a smaller change or effect, you’ll need a larger sample size to detect the difference. But if you have a large amount of traffic and large revenue, small improvements can have a big effect on revenue.
For small websites, though, testing for small-scale changes is usually not cost-effective, even if it’s theoretically possible. It’s best for smaller stores to prioritize testing hypotheses with the greatest improvement potential.
Finally, you should always make sure that you test for longer periods of time. This helps eliminate any seasonal effects and/or variations in site traffic caused by holidays or different days of the week.
The mechanics of testing & testing tools
Setting up and conducting experiments on websites is simple. Once you have your hypothesis, you can start creating variations on it. Variations are usually initially created as wireframe mockups that provide a general idea of the layout of a webpage.
Once the idea is framed into a practical mockup of the page, you need to move forward with designing the variation. For smaller changes, it’s possible to use the tools integrated in testing tools. For more complicated changes, you’ll need actual web design tools.
Then, after you have a completed page test variation, you’ll import it into a testing tool. You’ll define the goal you want to track and compare between the original page and the variation. The tool itself does the necessary statistical calculations.
(You’ll need to know the sample size in advance, though, just to avoid the temptation to call a winner too early.)
The testing tool will randomly assign visitors to the control page and the variation, and compare the results. For ecommerce sites, you’ll most likely want to compare the effect of your changes using conversion rate, since that’s the best indicator of improvement.
However, you can also use other criteria to determine a winner, such as whether visitors fill a registration form or another trackable interaction. These criteria can be useful to test changes if you want to improve an aspect of your website other than purchases.
If you have a blog, and you'd like to try different layouts of your blog interface to increase visitor engagement (hoping that when visitors read blog posts, they are more likely to convert). You can use session duration or depth of scrolling as an indicator of which layout variation performs best.
Continuously adapt & improve
Once you complete the improvement of the issues you found in the first round of the conversion optimization and testing process, you need to continue refining the results.
Once you start testing, you should never stop! A testing mindset will always lead to constant improvement.
When you implement a winning solution, you’ve merely found a design that performs better than the previous one. There’s always the possibility that another variation may turn out to be even better.
Testing is a self-perpetuating activity. The success of the first round of tests (and removal of the largest obstacles to conversion) means that now, you can start refining and dapting your website more.
Gradually, as you eliminate more and more issues, you’ll enter a zone called “local maxima”. Essentially, this is a plateau from which no significant improvements are possible.
The "local maxima" is your ultimate goal.
If you reach this point, then it might be time to consider a total overhaul of your website. In the meantime, you know the way to get there: testing and more testing.
<download-checklist-2>Download 51 Questions to Help Define User Personas NOW<download-checklist-2>