What is A/B testing?
A/B testing, also known as split or bucket testing, is a method where a control version of content (A) and a variant (B) are randomly presented to users to determine through statistical analysis which one more effectively meets specific conversion goals and appeals to viewers.
It originated with Ronald Fisher, a 20th-century biologist and statistician widely credited with developing the principles and practices that make this method reliable.
As it applies to marketing, A/B testing can be traced to the 1960s and 1970s, when it was used to compare different approaches to direct response campaigns.
Now, A/B testing is used to evaluate all sorts of initiatives, from emails and landing pages to websites and apps. While the targets of A/B testing have changed, the principles behind it have not.
Below, we discuss how A/B testing works, different strategies behind the experimentation process, and why it’s critical to your success.
How does A/B testing work?
A/B testing is a type of experiment. It tests two different versions of a website, app, or landing page. This experiment shows the differences in user responses. It helps to gauge user response.
Statistical data is collected and analyzed to determine which version performs better.
A/B testing compares two versions of a webpage. The control version is known as variation A. Variation B contains the change that is being tested.
A/B testing is sometimes referred to as random controlled testing (RCT) because it ensure sample groups are assigned randomly. This helps produce better results.
Why should you consider A/B testing?
When there are problems in the conversion funnel, A/B testing can be used to help pinpoint the cause. Some of the more common conversion funnel leaks include:
Confusing calls-to-action buttons
Poorly qualified leads
Complicated page layouts
Too much friction is leading to form abandonment on high value pages
Checkout bugs or frustration
A/B testing can be used to test various landing pages and other elements to determine where issues are being encountered.
Solve visitor pain points
When visitors come to your website or click on a landing page, they have a purpose, like:
Learning more about a deal or special offer
Exploring products or services
Making a purchase
Reading or watching content about a particular subject
Even “browsing” counts as a purpose. As users browse, they might encounter roadblocks that make it difficult for them to complete their goals.
For example, a visitor might encounter confusing copy that doesn't match the PPC ad they clicked on. Or a CTA button might be difficult to find. Or maybe the CTA button doesn't work at all.
Every time a user encounters an issue that makes it difficult for them to complete their goals, they might become frustrated. This frustration degrades the user experience, lowering conversion rates.
There are several tools you can use to understand this visitor behavior. Fullstory, for example, uses heatmaps, funnel analysis, session replay, and other tools, to help teams perfect their digital experiences.
By analyzing this data, you can identify the source of user pain points and start fixing them.
Use both quantitative and qualitative data when analyzing a problem. This will help identify the issue and understand the cause. No matter which tool you choose, make sure to combine both types of data.
How Thomas used Fullstory and a robust A/B testing program to boost conversions 94%
"Fullstory helps us keep users top-of-mind, but also gives our co-workers a systematic way to propose evidence-based ideas for improvement."
Get better ROI from existing traffic
If you're already getting a lot of incoming traffic, A/B testing can help you boost the ROI from that traffic.
Through improving conversion funnels, data from A/B testing can also help businesses maximize existing traffic ROI.
A/B testing helps to identify which changes have a positive impact on UX and improve conversions. This approach is often more cost-effective than investing in earning new traffic.
Reduce bounce rate
Bounce rate is a metric that calculates how often someone arrives on your site, views one page, then leaves.
Bounce rate is calculated by dividing the number of single-page sessions by the number of total users sessions on your website.
There are also other ways to define a bounce rate, but they all imply the same thing: disengaged users.
Essentially, a high bounce rate indicates that people enter your website, encounter something confusing or frustrating, then quickly leave.
This type of friction is a perfect example of when to A/B test. You can identify the specific pages where visitors are bouncing and change things you think are problematic. Then, use A/B testing to track the performance of different versions until you see an improvement in performance.
These tests can help identify visitor pain points and improve UX overall.
Make low-risk modifications
There's always a risk in making major changes to your website.
You could invest thousands of dollars in overhauling a non-performant campaign. However, if those major changes don't pay off, then you won’t see a return on that investment. Now you've lost lots of time and money.
Instead, A/B testing to make small changes rather than implementing a total redesign.
That way, if a test fails, you have risked much less time and money.
Achieve statistically significant results
It’s important to recognize that A/B testing only works well if the sampling is statistically significant. It doesn’t work if testers rely on assumptions or guesses in setting up the tests or analyzing the results.
Statistical significance is used to determine how meaningful and reliable an A/B test is. The higher the statistical significance, the more reliable a result is.
Statistical significance is "the claim that a set of observed data are not the result of chance but can instead be attributed to a specific cause."
If a test is not statistically significant, there could be an anomaly, such as a sampling error. And if the results are not statistically significant, they shouldn’t be considered meaningful.
The ideal A/B test results should be 95% statistically significant. Though sometimes testing managers use 90% so the required sample size is smaller. This was statistically significant results arrive faster.
There can be challenges to reaching a sufficient level of statistical significance:
Not enough time to run tests
Pages with exceptionally low traffic
Changes are too insignificant to generate results
It's possible to achieve statistical significance faster by running tests on pages that get more traffic or by making larger changes in your tests. In many cases, a lack of traffic makes it nearly impossible to get results that are significant enough.
Understanding sample sizes and statistical significance also helps you plan how long your tests will take.
Redesign websites to increase future business gains
If you do engage in a full website redesign, A/B testing can still be helpful.
Like any other A/B testing, you will make two versions of the site available. Then, you will measure the results after you have received a statistically significant number of visitors.
A/B testing should not end once you’ve gone live. Instead, this is the time to begin refining elements within your site and testing those.
Important ideas to know while testing
There are specific strategies to consider when testing: Multipage testing, split URL, dynamic allocation, and multivariate testing.
Split URL testing
Split URL tests are used for making significant changes to a webpage in situations when you don't want to make changes to your existing URL.
In a split URL test, your testing tool will send some of your visitors to one URL (variation A) and others to a different URL (variation B). At its core, it is a temporary redirect.
When should you consider split URL testing?
In general, use this if you are making changes that don’t impact the user interface. For example, if you are optimizing page load time or making other behind-the-scenes modifications.
Larger changes to a page, especially to the top of a page, can also sometimes “flicker” when they load, creating a jarring UX. Split URLs are an easy way to avoid this.
Split URL testing is also a preferred way to test workflow changes. If your web pages display dynamic site content, you can test changes with split URL testing.
Multivariate testing
Multivariate testing is a more complex form of testing, and an entirely different test type from A/B.
It refers to tests that involve changes to multiple variations of page elements that are implemented and tested at the same time. This approach allows testers to collect data on which combination of changes performs best.
Multivariate testing eliminates the need to run multiple A/B tests on the same web page when the goals of each change are similar to one another. Multivariate testing can also save time and resources by providing useful conclusions in a shorter period.
For example, instead of running a simple A/B test on a page, let's say you want to run a whole new multi-page experience. You want step 1 to be either variation A or B, and you want step 2 to be either C or D.
When you run a multivariate test, you'll be running many combinations of these variations:
A then C
A then D
B then C
B then D
A multivariate test is, essentially, an easier way to run multiple A/B tests at once.
Because there are more variations in multivariate tests than A/B tests, multivariate requires more traffic to achieve statistical significance. This means it will probably take longer to achieve reliable results.
Multipage testing
A multipage test involves changes to specific elements. Instead of testing the change you make on one page, you apply that change to multiple pages, such as every page within a particular workflow.
In that case, you would use the sales funnel to gauge the results. Then, you test the new pages against the control. This approach is known as "funnel multipage testing."
The second is to add or remove repeating items such as customer testimonials or trust indicators. Then, test how those changes affect conversions. This approach is known as "conventional" or "classic multipage testing."
Dynamic allocation
Dynamic allocation is a method of quickly eliminating test low-performing variations. This method helps to streamline the testing process and save time. It's also known as a multi-armed bandit test.
Let’s say you’re an online retailer and are holding a flash sale. You know you want as many people as possible to view your sale items when they arrive on your site. To do that, you want to show them a CTA color that gets as many people to click on it as possible — blue, pink, or white.
Using a dynamic allocation test, your testing tool will automatically detect which variation drives the most clicks and automatically show that variation to more users. This way you drive as many clicks as possible as fast as possible.
Because traffic is not split equally, dynamic allocation doesn't yield statistical significance and doesn’t yield any learnings you can use in the future.
Dynamic allocation is for quick conversion lifts, not learning.
How do you choose which type of test to run?
There are several factors to consider when deciding which tests to run for conversion rate optimization (CRO) testing. You should think of:
The number of changes you’ll be making
How many pages are involved
The amount of traffic required to get a statistically significant result
Finally, consider the extent of the problem you are trying to solve. For example, a conversion on a landing page that can benefit by changing a button color would be a perfect use case for A/B testing.
However, changing multiple pages that a user encounters across their customer journey would be a better fit for multipage testing.
How do you perform an A/B test?
An A/B test is a method of testing changes to determine which changes have the desired impact and which do not.
While organizations once only occasionally turned to A/B testing, now many teams across 51% of top sites A/B test to improve customer experience and boost conversions.
Step 1: Research
Before any tests can be conducted or changes made, it's important to set a performance baseline. Collect both quantitative and qualitative data to learn how the website in question is performing in its current state.
The following elements represent quantitative data:
Bounce rate
Video views
Traffic
Subscriptions
Average items added to the cart
Purchases
Downloads
Much of this information can be collected through a behavioral data platform like Fullstory.
Qualitative data includes information collected on the user experience through polls and surveys. Especially when used in conjunction with more quantitative data, it is valuable in gaining a better understanding of site performance.
Step 2: Observe and formulate a hypothesis
At this stage, you analyze the data you have and write down the observations that you make. This approach is the best way to develop a hypothesis that will eventually lead to more conversions. In essence, A/B testing is hypothesis testing.
Step 3: Create variations
A variation is simply a new version of the current page that contains any changes you want to subject to testing. This alteration could be a change to copy, headline, CTA button, etc.
Step 4: Run the test
You’ll select a testing method here according to what you are trying to accomplish, as well as practical factors like expected traffic. The length of the test will also depend on the level of statistical accuracy you want. Remember, a higher statistical significance is more reliable, but requires more traffic and time.
Step 5: Analyze results and deploy the changes
At this point, you can go over the results of the tests and draw some conclusions. You may determine that the test was indeed conclusive and that one version outperformed the other. In that case, you simply deploy the desired change.
But that doesn't always happen. You may need to add and test an additional change to gain additional insights. Additionally, you might decide to move on to testing changes in another part of the workflow. With A/B testing, you can work through all of the pages with a customer journey to improve UX and boost conversions.
There are some best practices to use while A/B testing, but understand that all sites, apps, and experiences are different. The true “best practices” for you can only be truly understood through testing.
Server-side vs. client-side A/B testing
With client-side testing, a website visitor requests a particular page. This page is delivered by the webserver. However, javascript is executed within the visitor's browser session and adjusts what is presented.
This adjustment is based on which variation they see according to the targeting you set up in the A/B test. This form of testing is used for visible changes such as fonts, formats, color schemes, and copy.
Server-side testing is a bit more robust. It allows for the testing of additional elements. For example, you would use this form of testing to determine whether speeding up page load time increases engagement. You would also use server-side testing to measure the response to workflow changes.
How do you interpret the results of an A/B test?
The results of an A/B test are measured based on the rate of conversions that are achieved. The definition of conversion can vary. It might include a click, video view, purchase, or download.
This step is also where that 95% statistical significance comes into play. After multiple test runs, 95% will indicate the true rate of conversion. However, you also have to consider a margin of error.
So if that margin is ±3%, that can be interpreted as follows: If you achieve a conversion rate of 15% on your test, you can say that conversions are between 12% and 18% with 95% confidence.
Frequentist approach
There are two ways to interpret A/B testing results.
The first is the frequentist approach. This approach is based on the assumption that there are no differences between A and B.
Once testing ends, you will have a p-value or probability value. This value is the probability that there is no difference. So a low p-value means that there is a high likelihood of differences.
The frequentist approach is fast and popular, and there are many resources available for using this method. The downside is that it's impossible to get any meaningful results until the tests are fully completed. Also, this approach doesn't tell you how much a variation won by — just that it did.
Bayesian approach
The Bayesian approach involves the use of existing data in the experiment. These are known as priors, with the first prior being "none" in the first round of tests. In addition to this, there are also evidences, which is the data from the current experience.
Finally, there is the posterior. This figure is the information that is produced as the result of the Bayesian analysis of the prior and the evidences.
The key benefit of the Bayesian approach is that you can look at the data during the test cycle. Then, you may call the results if it is clear that there is a clear winner. Additionally, you are able to identify a winning variation.
How do companies use A/B testing?
A/B testing can be used by brands for many different purposes. As long as there is some sort of measurable user behavior, it's possible to test that.
The A/B test method is often used to test changes to website design, landing pages, content titles, marketing campaign emails, paid ads, and online offers. Generally, this testing is done without the test subjects knowing they are visiting Test Version A of a web page as opposed to Version B.
Stage 1: Measure
In the planning stage, the idea is to identify ways to increase revenue by increasing conversions. Stage one includes analyzing website data and visitor behavior metrics.
Once this information has been gathered, you can use it to plan changes and create a list of website pages or other elements to be changed. After this is done, you may create a hypothesis for each element to be changed.
Stage 2: Prioritize
Set priorities for each hypothesis depending on the level of competence, importance, and ease of implementation.
There are frameworks available to help with the process of setting these priorities, for example, the ICE, PIE, or LIFT models.
ICE model
ICE is importance/confidence/ease.
Importance — How important is the page in question
Confidence — The level of confidence the test will succeed
Ease — How easy is it to develop the test
Each item is scored, and an average is taken to rate its priority.
PIE model
PIE is potential/impact/ease.
Potential — The business potential of the page in question
Impact — The impact of the winning test
Ease — How easily the test can be executed
The variables here are slightly different from ICE but are scored in the same way.
PIE and ICE are both easy to use, but the downside is that they are subjective. People will apply their own biases and assumptions when scoring these variables.
LIFT model
LIFT Model is the third framework for analyzing customer experiences and developing hypotheses. This framework is based on six factors:
Value proposition — The value of conversions
Clarity — How clearly the value proposition and CTA are stated
Relevance — Relevance of the page to the visitor
Distraction — Elements that distract visitors from the CTA
Urgency — Items on the page that encourage visitors to act quickly
Anxiety — Anything creating hesitance or lack of credibility
Stage 3: Test
After prioritizing, determine which ideas will be tested and implemented first by reviewing the backlog of ideas. You should decide according to business urgency, resources, and value.
Once an idea is selected, the next step is to create a variation. Then, go through the steps of testing it.
Stage 4: Repeat
It's risky to test too many changes at the same time. Instead, test more frequently to improve accuracy and scale your efforts.
The top A/B testing tools to use
There are several tools available to help businesses set up, execute, and track A/B tests. They all vary both in price and capability.
The best web and app A/B testing tools
Optimizely
Optimizely is a platform for conversion optimization through A/B testing. Teams can use the tool to set up tests of website changes to be experienced by actual users. These users are routed to different variations; then, data is collected on their behavior. Optimizely can also be used for multipage and other forms of testing.
AB Tasty
AB Tasty offers A/B and multivariate testing. Testers can set up client-side, server-side, or full-stack testing. Additionally, there are tools like Bayesian Statistics Engine to track results.
Both Optimizely and AB Tasty seamlessly integrate with Fullstory so you can see how users seeing different experience behave.
VWO
VWO is the third big player in A/B testing and experimentation software. Like Optimizely and AB Tasty, they offer web, mobile app, server-side, and client-side experimentation, as well as personalized experiences.
Email A/B testing tools
There are several specialized tools for testing changes made to marketing campaign emails. Here are the most widely used:
Moosend
Moosend is a tool for creating and managing email campaigns. It offers the ability to create an A/B split test campaign. This ability lets marketers test different variations of marketing emails, measure user response, and select the version that works best.
Aweber
Aweber provides split testing of up to three emails. Users can test elements like subject line, preview text, and message content. Additionally, it allows for other test conditions such as send times. Testing audiences can be segmented if desired, and completely different emails can be tested against one another.
MailChimp
MailChimp users can A/B test email subject line, sender name, content, and send time. There can be multiple variations of each variable.
Then, the software lets users determine how the recipients will be split among each variation. Finally, testers can select the conversion action and amount of time that indicates which variation wins the test. For example, the open rate over a period of eight hours.
Constant Contact
Constant Contact offers subject line A/B testing. This feature helps users validate which version of an email subject line is most effective. It is an automated process where the tool automatically sends emails with the winning subject line to recipients once the winner is determined.
A/B testing and CRO services and agencies
Some companies have the infrastructure and personnel in place to run their own experimentation program, but other companies might not. Fortunately, there are services and agencies available to help drive your A/B testing and CRO efforts.
Conversion
Based in the UK, Conversion is one of the world's largest CRO agencies and work with brands like Microsoft, Facebook, and Google.
Lean Convert
Also based in the UK, Lean Convert is one of the leading experimentation and CRO agencies.
Prismfly
Prismfly is an agency that specializes in ecommerce CRO, UX/UI design, and Shopify Plus development.