I kept a 5% holdout. It exposed my fake gains.
This is the move that changed how I trust my own dashboard.
All year I shipped 'winners.' Dashboard said I was up huge, cumulatively. Felt like a genius.
Then I checked my 5% holdout — a slice of traffic that NEVER got any of my changes, frozen on the original page.
My 'winners' vs the holdout? The real cumulative lift was about 40% of what the individual tests claimed. Wins overlap, decay, and interact. The dashboard double-counted.
The holdout is the only honest mirror you've got. It tells you what your work was ACTUALLY worth.
— Carve out a small permanent holdout that gets zero changes
— Compare your live page to it quarterly
Go set up a holdout group today. Brace yourself. Report back.
This is the move that changed how I trust my own dashboard.
All year I shipped 'winners.' Dashboard said I was up huge, cumulatively. Felt like a genius.
Then I checked my 5% holdout — a slice of traffic that NEVER got any of my changes, frozen on the original page.
My 'winners' vs the holdout? The real cumulative lift was about 40% of what the individual tests claimed. Wins overlap, decay, and interact. The dashboard double-counted.
The holdout is the only honest mirror you've got. It tells you what your work was ACTUALLY worth.
— Carve out a small permanent holdout that gets zero changes
— Compare your live page to it quarterly
Go set up a holdout group today. Brace yourself. Report back.
I optimized clicks and accidentally tanked revenue.
Product grid test. New layout pushed cheaper items up top. Add-to-carts jumped 14%. I cheered.
Guardrail metric: average order value. Down 22%. People bought, just bought the cheap junk I'd promoted to the top.
Net revenue per visitor: negative. My "win" lost money.
Now every test has one primary metric AND a guardrail it's not allowed to crater.
— Pick your one true north metric (usually revenue/visitor)
— Set guardrails: AOV, refund rate, churn, support tickets
— A primary win that breaks a guardrail is a loss
Go add a revenue guardrail to your running test before you call it. Report back.
Product grid test. New layout pushed cheaper items up top. Add-to-carts jumped 14%. I cheered.
Guardrail metric: average order value. Down 22%. People bought, just bought the cheap junk I'd promoted to the top.
Net revenue per visitor: negative. My "win" lost money.
Now every test has one primary metric AND a guardrail it's not allowed to crater.
— Pick your one true north metric (usually revenue/visitor)
— Set guardrails: AOV, refund rate, churn, support tickets
— A primary win that breaks a guardrail is a loss
Go add a revenue guardrail to your running test before you call it. Report back.
Day 2 of the test and I peeked. Big mistake.
Variant B was crushing. +22%. I texted my partner 'we found it.'
Day 5: +3%. Day 8: dead even.
That early spike was noise. Small samples swing wild. If you call it on day 2 you're just gambling on randomness and calling it skill.
The fix that saved me: I now set a fixed sample size BEFORE the test starts. No reading results until I hit it. I literally hide the dashboard.
If you must peek for sanity, use a sequential test (always-valid p-values) so early looks don't inflate your false positives.
— Decide your sample size first
— Don't call a winner before you hit it
Go set a stop number on your running test. Then close the tab.
Variant B was crushing. +22%. I texted my partner 'we found it.'
Day 5: +3%. Day 8: dead even.
That early spike was noise. Small samples swing wild. If you call it on day 2 you're just gambling on randomness and calling it skill.
The fix that saved me: I now set a fixed sample size BEFORE the test starts. No reading results until I hit it. I literally hide the dashboard.
If you must peek for sanity, use a sequential test (always-valid p-values) so early looks don't inflate your false positives.
— Decide your sample size first
— Don't call a winner before you hit it
Go set a stop number on your running test. Then close the tab.
Testing a pricing page without nuking revenue
Pricing tests scare people because you can torch real money. I run them in a safe order, lowest risk to highest:
— Start with layout, not price. 3 tiers vs 2. Toggle position. Zero revenue risk.
— Test the 'most popular' badge placement. Anchoring is free money.
— Test annual-vs-monthly default toggle. Defaulting to annual lifted my AOV without changing a price.
— Test feature framing and ordering inside each tier.
— Test the order of tiers (high-to-low anchors differently than low-to-high).
— ONLY then touch actual numbers, and cap exposure to 50% of traffic.
Never change price AND layout in one test. You won't know what moved.
Go test your 'most popular' badge first. It's free, it's fast, it anchors.
—
В @ScaleOrStall такого cbo vs abo scaling ещё много
Pricing tests scare people because you can torch real money. I run them in a safe order, lowest risk to highest:
— Start with layout, not price. 3 tiers vs 2. Toggle position. Zero revenue risk.
— Test the 'most popular' badge placement. Anchoring is free money.
— Test annual-vs-monthly default toggle. Defaulting to annual lifted my AOV without changing a price.
— Test feature framing and ordering inside each tier.
— Test the order of tiers (high-to-low anchors differently than low-to-high).
— ONLY then touch actual numbers, and cap exposure to 50% of traffic.
Never change price AND layout in one test. You won't know what moved.
Go test your 'most popular' badge first. It's free, it's fast, it anchors.
—
В @ScaleOrStall такого cbo vs abo scaling ещё много
I almost shipped a fake winner
This week I ran a hero test. Variant B up 14%. I was reaching for the ship button.
Then I checked the split. 52/48. Supposed to be 50/50.
That gap is sample ratio mismatch. Means something upstream broke the randomization — a redirect, a cache, a bot bucket. The 14% was probably an artifact, not a win.
Killed the test. Found a cached page serving B to returning users only. Of course B looked better. It was talking to warm traffic.
Day 1 lesson I relearn every quarter: before you read the lift, read the counts.
— Pull your variant traffic numbers right now
— If the split is off by more than ~1%, your result is garbage
Go check the SRM on your live test. Report back.
This week I ran a hero test. Variant B up 14%. I was reaching for the ship button.
Then I checked the split. 52/48. Supposed to be 50/50.
That gap is sample ratio mismatch. Means something upstream broke the randomization — a redirect, a cache, a bot bucket. The 14% was probably an artifact, not a win.
Killed the test. Found a cached page serving B to returning users only. Of course B looked better. It was talking to warm traffic.
Day 1 lesson I relearn every quarter: before you read the lift, read the counts.
— Pull your variant traffic numbers right now
— If the split is off by more than ~1%, your result is garbage
Go check the SRM on your live test. Report back.
This week: button color vs button verb
Everyone wants to test green vs orange. Snooze.
I tested the VERB instead. Same button, same color. Just changed 'Get Started' to 'Get My Free Audit.'
Setup: landing page for a CPA offer, ~6k clicks split over 9 days.
Result: the specific-value verb pulled +19% clicks to the form. The color test I ran last month? Flat. Couldn't tell them apart with a microscope.
The lesson: color is decoration. The verb is the promise. 'Get My Free Audit' tells them what they walk away with. 'Get Started' tells them about work they have to do.
— Open your hero CTA
— Swap the generic verb for one that names the payoff
Go change your CTA verb to claim the reward, not start the chore. Report back.
Everyone wants to test green vs orange. Snooze.
I tested the VERB instead. Same button, same color. Just changed 'Get Started' to 'Get My Free Audit.'
Setup: landing page for a CPA offer, ~6k clicks split over 9 days.
Result: the specific-value verb pulled +19% clicks to the form. The color test I ran last month? Flat. Couldn't tell them apart with a microscope.
The lesson: color is decoration. The verb is the promise. 'Get My Free Audit' tells them what they walk away with. 'Get Started' tells them about work they have to do.
— Open your hero CTA
— Swap the generic verb for one that names the payoff
Go change your CTA verb to claim the reward, not start the chore. Report back.
