Wingify's SmartStats ends split testing's most common problems

Conversion rate optimization (CRO), when done properly, is a little like printing money. Instead of spending thousands of dollars doubling the traffic to your website, you can spend very little (in comparison) doubling conversions from your existing visitors.

And it works.

In fact, when I asked over 3,000 CRO tool users what their average return on investment was, it came to a staggering 224 percent. Almost 6 percent of these respondents had gained over 1,000 percent returns.

But CRO isn’t as easy as switching on a tap and waiting for the dollars to flow out. It takes work. It takes understanding. It takes patience. The latter is most often required when undertaking an A/B split test.

Whenever I talk to anyone about split testing, the same questions come up early in the conversation. "How long should I run my test for?" "How do I know when to pick a winner?" "What happens if I stop the test early?"

While many calculators and white papers exist to help answer these questions, the answers usually call for bigger sample numbers, and longer timeframes, than the user may at first expect.

Today, Wingify has announced the launch of its new SmartStats product, which is trying to solve the problems created by standard split test measurement methods through the use of a Bayesian-powered engine.

The company claims its new system determines the winner of a test in half the time of traditional methods. From today, SmartStats powers Wingify's flagship CRO tool, Visual Website Optimizer (VWO).

To perform an A/B split test, we show something new to a test group while showing the existing version to a control group. The idea is apparently simple. If the test group responds better than the control group to the new button, graphic, color scheme, text, or whatever you've decided to change, then you make that change permanent.

But before deciding whether that change has "won" or not, you have to also determine whether the difference in performance is due to chance. Doing this with standard methods -- or "classical frequentist" A/B testing, as we call it -- can be problematic and slow.

Classical frequentist models suffer from a problem some people call "the peek issue" -- a statistical quandary that causes results to look more significant than they are as you repeatedly look at the experiment, often causing the user to end the test early.

"This issue happens for classical frequentist A/B testing methodologies because in such methods you are required to fix a sample size in advance," Paras Chopra, Wingify's founder and CEO, told me. "With classical frequentist, only a fixed sample guarantees you that the results you see at the end of the test are within acceptable error bounds."

And the more you peek at a test, the worse the issue becomes.

"If you happen to see the results in between, the errors naturally increase because some of your assumptions for the test break down and you underestimate the actual errors," Chopra said. "Fundamentally, frequentist methods ask 'out of 100 A/A tests I run, how many samples are required for each A/A test so that in the end, no more than five A/A tests produce a significant result.' You would notice that this threshold of five requires establishing a fixed sample in advance. And if you peek at results before this fixed sample has been collected, you increase the probability that what you see is an error from five percent to higher. There are sequential versions of this frequentist method that circumvent the problem, but those versions usually take a lot more samples, thus the tests need to be run longer."

Bayesian methods get around this problem entirely. Chris Stucchio, Wingify's director of data sciences, explained it succinctly.

"Bayesian statistics asks a different question -- 'given the information we've seen, what is the probability that B has a higher conversion rate than A?' This question does not have a specific time built into it," Stucchio said.

In the context of an A/B split test, Bayesian methods certainly work well. You can stop a test whenever you like and the results will still be valid.

But is a Bayesian-powered engine right for all types of business, no matter the traffic levels, or is it better for small businesses?

"Bayesian experimentation works for businesses of all sizes," Chopra said. "In fact, we've seen SmartStats on an average to be 50 percent faster compared to other methods. This means that even an SME with lesser traffic can derive statistically valid results faster than before."

And when you're an agile, small business trying to survive the realities of making money, speed is important. A calculator on Wingify's SmartStats website shows the difference in sample size needed to produce a statistically significant test result.

"As for enterprises, I think it scales really well for them. With large amounts of data, SmartStats will not only give smart business decisions but the additional samples will provide a lot more corroboration to such business decisions," Chopra said. "Enterprises will certainly appreciate looking at the results from both lenses -- business and scientific."

So what's next for Wingify? Will it be rolling out Bayesian-powered methods to other areas of the business?

"We're aiming to completely move to Bayesian methodologies throughout," Chopra said. "It helps us do many more interesting things in future in segmentation and personalization, among various other things."

SmartStats is available today for existing customers, and new users can try VWO -- now powered by the SmartStats engine -- free for thirty days.

More