what is peeking feature image

What Is Peeking And How Do I Avoid It?

There are plenty of mistakes you can make when running tests on your website. Peeking is one of the trickiest. Here are some tips to avoid it and get the results you want.

Have you ever sacrificed the purity of a test to make a quicker decision?

Maybe you’ve taken a look at the data before you reached your prescribed sample size and felt the test was already giving obvious results in one direction or the other.

Almost every product owner running tests faces this dilemma. And trust me, I get it.

The pursuit of results at an adequate sample time takes patience, and there are plenty of reasons to try to rush to the finish line: As you wait for a test to run, you may be prolonging a negative experience on your site. Or you may be losing out on presenting a positive experience to all your visitors. You also could be struggling to hold off stakeholders who want a quick decision and implementation.

But it’s crucial to let your tests run for their pre-established amount of time. If you don’t, you risk “peeking,” which increases error rates, causes False Positives, and invalidates results.

What is peeking?

Peeking is the act of looking at your A/B test results with the intent to take action before the test is complete.

Because most experiments have a 70% chance of looking “significant” before they are truly done collecting sufficient data, peeking at test results too early can introduce bias and potentially alter the course of decision-making based on incomplete information. Waiting until the test is complete allows for a more accurate assessment of statistical significance.

what is peeking graph

Think about it like this. If you flip a coin twice and get heads both times, you could incorrectly assume that the coin will land on heads 100% of the time.

However, if you flipped it a statistically significant amount of times, you’d get closer to the true rate of 50%.

Testing is the same. If you don’t allow enough time for an experiment to run and are constantly peeking at the results to make faster decisions, you are likely to make incorrect assumptions.

Plenty of mistakes can be made when running a test, but peeking is trickier to avoid than the rest. It is tempting to even the most experienced experimentation practitioners.

So, let’s talk about it and how you can avoid getting caught in its crosshairs.

How To Avoid Peeking: Set Clear Minimum Standards Before You Run A Test & Stick To Them

Before running a test, clearly define minimum standards to ensure the results are valid. These are the same standards to use when interpreting the results of a test.

Pre-determine your Significance Level

The general rule is to let your tests reach 90+% statistical significance, but the exact number can vary slightly depending on your team’s risk tolerance.

NASA scientists likely need a 99.999999% statistical significance before feeling sure of a decision, while an ecommerce site owner might only need 85% statistical significance to feel confident in their decisions.

Establishing the significance level helps your team come to a consensus about the level of error or False Positives you’re willing to accept in your test.

Enjoying this article?

Subscribe to our newsletter, Good Question, to get insights like this sent straight to your inbox every week.

Achieve Appropriate Sample Size

Set a goal sample size that is representative of your audience and large enough to account for variability. It’s necessary to calculate your sample size before the test to determine how long to run a test to achieve rapid but reliable results.

If the test is stopped before it reaches a significant number of visitors, the results may not be valid. That’s because when the number of sessions or conversions is low, there is a high likelihood that changes will be observed by chance.

As the test collects more data, the conversion rates converge toward their true long-term values. This is known as “regression to the mean.” We often see a false positive on the first day of running an A/B test, and we expect those changes to regress to the mean or “normalize” over time.

For example, we might see a novelty effect of existing users who are already familiar with your site who are reacting positively to the changes made in your experiment. That would result in a false positive that would normalize as users get used to the change. That’s why seeing 90+% statistical significance isn’t a stopping rule alone.

Set a Minimum Test Duration

Letting your test run for a pre-allotted amount of time is key to avoiding the pitfalls of peeking.

We suggest a minimum of two weeks to account for two full business cycles. This leaves room for any unexpected variables (maybe your competitor is running a sale that week, which lowers your traffic volume, or there is a federal holiday, so fewer people are online shopping).

One important factor is looking for a good understanding of the performance range on any test, and that range gets smaller with more data. Testing tools may show statistical significance with even a small sample size, but even those tools will recommend running tests for at least two weeks.

Like I said, before that, you’re peeking, which can cause you to have false positives.

For a test duration cap, everyone is different. As our Director of CRO and UX Strategy, Natalie Thomas, says:

“Every team has a different tolerance for test duration. I know teams that will let a test run for six months and others that only want to prioritize initiatives that will see significance in two weeks. Having this litmus just assures folks are talking about their tolerance up-front.”

– Natalie Thomas

Set a Minimum Number of Conversions

Set a goal for the number of conversions or actions taken that will be a large enough sample size for your audience to know if the test was a winner.

This will vary based on the primary goal you’re focused on (e.g., ecommerce transactions versus inquiries, for example), so you’ll need to know an average number of conversions you get in a week.

Look for Alignment with Secondary Goals 

Part of the test setup process is defining a primary goal (for us, typically transactions or increasing conversion rate) that will determine if a test is ‘successful,’ but secondary goals will provide more insight into behavior, for example, adds to cart, visits to the next stage in the funnel.

If you’re at a point where you’re trying to analyze results, and these are not aligned (e.g., conversion rate is up, adds to cart are down, visits to product pages are unaffected), it could mean you don’t have enough data yet to tell the whole story.

A/B Testing Like The Experts

In the same way opening the oven before a cake is fully baked can impact the cake’s final consistency, prematurely analyzing test results can lead to skewed outcomes.

When testing on a website or app, the goal is to gather user-centered evidence that helps you make a decision.

You’re looking for a signal, not the final answer.

But to be confident in that signal, you need to set your tests up with pre-established standards. It helps your team align on when to end the test and makes sure you avoid the trickiest pitfall of experimentation: peeking.

Enjoying this article?

Subscribe to our newsletter, Good Question, to get insights like this sent straight to your inbox every week.

maggie paveza

About the Author

Maggie Paveza

Maggie Paveza is a Strategist at The Good. She has years of experience in UX research and Human-Computer Interaction, and acts as an expert on the team in the area of user research.