Frequentist sequential testing

6/13/2023

Both tests allow Split to calculate the impact and gain a computed p-value. However, because sequential testing doesn’t offer a cure all for experimental issues, you should normally use it in situations where traffic is high and the expected experimental impact is large.īoth fixed horizon and sequential testing use 2-tailed t-tests, which allows you to detect significance in both directions (positive & negative). If there is a difference between treatment and control, this sequential testing method also is guaranteed to detect it. Specifically, the sequential testing method, which is called mixture Sequential Probability Ratio Test (mSPRT), allows you to check the results almost immediately after launching the experiment for an unlimited number of times, while controlling for a user-specified false positive rate (see below for definition of false positive). To complement fixed horizon testing, Split also offers sequential testing, which does not require pre-experiment power analysis (see below for definition of power) and allows early peeking of results. This creates friction for users who are not experts in experimentation–it not only assumes deep knowledge of power analysis, but also prevents users from checking their results early, which potentially slows users down for iteration. However, fixed horizon tests in general rely on power analyses before an experiment to estimate when you can check the results, and stipulate that you do not peek at the results before the estimated timeframe.

This method does not assume equal variances across treatment and control and obtains more accurate results than a traditional Student's t-test. With fixed horizon, Split uses the Welch's t-test, or the unequal variances t-test. Within the frequentist framework, Split offers sequential and fixed horizon testing methods. Another advantage of the frequentist system is that we can share our data clearly with customers to follow the work in a way that Bayesian analysis isn’t as easily replicated (especially at scale), and that we have been able to leverage improvements pioneered by industry leaders in product experimentation at companies like LinkedIn and Microsoft. This is why the frequentist approach is usually preferred. Or if you don’t have previous data, you can use your best guess, which is often inaccurate or biased. Priors often require additional work from users who are often not trained to work with them. Bayesian analysis requires a well-formed prior, which is information obtained from previous experiments. Using a frequentist vs a Bayesian approachįrequentist testing to experimentation is the most commonly used hypothesis testing framework within industry, scientific and medical fields.

0 Comments

Frequentist sequential testing

Leave a Reply.

Author

Archives

Categories