How to Do A/B Testing as a Designer

5 min read

A designer's guide to A/B testing — forming hypotheses, designing clean variants, understanding statistical significance, interpreting results, and avoiding common testing mistakes.

A/B testing is useful for one thing: determining whether a specific change to an existing design improves a specific metric. That's a narrow use case. Most design decisions shouldn't be A/B tested — either the traffic isn't there, the metric isn't clear, or the answer requires research, not statistics.

Used correctly, it's one of the most reliable ways to make a product decision. Here's how to do it properly.

When A/B testing is useful

A/B testing requires:

Enough traffic. You need statistically significant sample sizes in a reasonable time frame. A page with 100 visitors a month can't generate valid A/B results. A checkout page with 10,000 visitors a week can.
A clear metric. What defines success? Click-through rate on a button, conversion rate on a form, time to completion for a task. Vague goals like "make it more engaging" can't be measured.
A specific, bounded change. You're changing one thing. Not a full redesign — one element.

If any of these conditions aren't met, A/B testing is the wrong tool. Use qualitative research instead.

How to form a hypothesis

A good A/B testing hypothesis has three parts: the change, the expected effect, and the reason.

Format: "If we [change], we expect [metric] to [increase/decrease] because [rationale based on evidence]."

Example: "If we change the primary CTA from 'Submit' to 'Start free trial', we expect the trial signup conversion rate to increase because user research indicates the current label creates uncertainty about what happens after clicking."

The "because" matters. It connects the test to real evidence rather than gut feel. A test based on a hypothesis with a clear rationale also gives you something to learn from if it fails — the rationale was wrong, which tells you something.

Designing the variants

In an A/B test, you have two variants: A (the current design, also called the control) and B (the changed version).

One change per test. This is the discipline that most A/B testing fails at. If you change the button color, the button label, and the button position simultaneously and the test shows improvement, you don't know which change caused it. You can't learn anything you can apply to the next test.

Design variant B in Figma with a single, clear difference from the control. Document exactly what changed in a Notion note attached to the test record — this matters when you're analyzing results three weeks later.

Try Figma Free

Statistical significance

Statistical significance is a measure of confidence that your result isn't due to random chance. The standard threshold is 95% confidence — meaning if you ran the test 100 times, you'd expect the same directional result 95 times.

Why this matters for designers: you can't call a test "done" just because one variant is currently ahead. Early in a test, random variation causes the results to swing back and forth. Ending a test early because variant B is winning — before you've hit significance — produces unreliable conclusions.

The practical rule: set a minimum sample size before you start, based on the expected difference in conversion rate and your current traffic. Free calculators (Optimizely has one, as does Evan Miller's A/B test calculator) will give you this number. Don't touch the test until you've hit that sample size.

Typical minimum run time: 2 weeks minimum, even if you hit sample size sooner. You need to account for weekly traffic patterns (weekday vs weekend behavior differs for most products).

Interpreting results

Three possible outcomes:

B wins (statistically significant). Implement B. Document the result: what changed, the improvement in metric, and the hypothesis that drove it. This becomes evidence you can reference in future design decisions.

A wins (B performed significantly worse). Keep A. The hypothesis was wrong. Document why — the rationale you used was incorrect. This is valuable learning.

Inconclusive (no significant difference). Neither variant is clearly better. This often means the change was too small to move the needle on the metric you measured, or the metric was wrong. Don't implement just because you tested.

Hotjar's heatmaps and session recordings can add qualitative context to A/B results. If B increased conversions but heatmaps show users hesitating on a new element, you might investigate further before declaring it a clean win.

Try Hotjar

Common mistakes designers make in A/B testing

Testing too many variables. Already covered — one change per test.

Ending tests early. The most common mistake. The data looks good on day three. Don't stop. It won't look the same on day 14.

Running tests on pages with too little traffic. Tests take months to reach significance at low traffic volumes. That's months where you could have made progress through qualitative research instead.

Testing without a hypothesis. "Let's try a red button and see what happens" produces data you can't act on. You need a reason before you run the test.

Confusing statistical significance with practical significance. A 0.2% lift in conversion rate might be statistically significant with high enough traffic. Is it worth the implementation cost? Decide before you test what lift would be meaningful enough to act on.

How designers fit into the A/B testing process

In most product teams, A/B testing is owned by product management or a growth team, not design. The designer's role is:

Designing the variants based on the hypothesis
Ensuring both variants are technically feasible to implement
Interpreting the visual/behavioral findings from heatmaps and session recordings
Using test results to inform future design decisions

Document your test results and their implications in Notion. Over time, you build a library of "what's known to work" for your specific product and audience — that's more valuable than any single test result.

Guide

How to Test Your Designs at Every Stage

A practical guide to design testing — from concept sketches to live A/B tests — covering fidelity-appropriate methods, minimum viable testing, and how to analyze results.

Review

Hotjar Review 2026: The Standard for Understanding What Users Actually Do

Hotjar offers heatmaps, session recordings, and feedback surveys to show how users behave on your live product. Free plan exists. Plus at $32/month. Here's the honest breakdown.