> The minimum threshold odds [to stop showing a variation] are calculated by 40% / number enabled variations
Uh... How did you choose those cutoffs? Looks like you have a significant chance of making the wrong choice.
Also:
> Once enough data is collected to start making conclusions (1000 sends per variation)
You should check out the Bayesian solution to the Multi-Armed Bandit problem. It's very close to what you are doing, but makes decisions much faster than you do because it isn't deciding to turn off a variation, merely to scale it down.
Thanks a lot for the feedback. This is a first pass implementation, but I agree that more thought should be put into the cutoff threshold, specifically for when there are only initially 2 (or maybe 3) variations.
I would suggest that you put together a quick monte-carlo simulation for any of the models you're experimenting with to see how well they perform when you actually know the true conversion rates. There's plenty of theoretical issues you can find with any method and the more complex what you're doing is the harder it can be to work it all out with pencil and paper. Likewise, because you're dealing with probabilistic solutions, real-world results can be deceptive (for example conversion rates may naturally fluctuation between weeks or months). I've found that testing with simulations is the best way to get a real sense of how whatever method you wish to employ will work.
I would ignore any non-baysian MAB posts out there. The formulation used by other approaches is one that considers an infinite number of repeated trials, which is basically an insane assumption. Epsilon greedy and UCB1 aren't optimal except with that assumption.
Except in that example the author is choosing E based on the observed difference between two means (which is the exactly the unknown you're trying to determine, so it makes no sense to use it as a constant in a formula), rather than the threshold for the minimum distance you care about.
If you're going the classical statistics route the entire point is that you need to determine your sample size before you peek at the data. In that post you would need to replace E with a threshold of difference that you care about, then calculate n before you start the test and not look at the results until you had reached n observations.
However with the Multi-armed bandit approach it does make sense to re-consider that since it's a continuous optimization problem and the overall average conversion rate would be higher.
That visualwebsiteoptimzer page is mathematically confused and the simulation is fatally flawed, I wish it'd disappear from the internet. Check the comments.
With email optimization in particular be warned that people will respond to changes simply because it is different. But then acclimatize.
What that means is that the changed version tends to win the test, but then may or may not perform well. The flip side of this is that if you have a choice, have multiple variations on the same email that you rotate between so that people don't get too used to your emails.
Nice seeing GA applied to more things. People often do not realize how many problems with feedback mechanism can be solved with GA framework. Here is small tutorial that got me started with GAs long long time ago: http://www.ai-junkie.com/ga/intro/gat1.html
Neural network tutorial of the same site is pretty cool too.
Uh... How did you choose those cutoffs? Looks like you have a significant chance of making the wrong choice.
Also:
> Once enough data is collected to start making conclusions (1000 sends per variation)
You should check out the Bayesian solution to the Multi-Armed Bandit problem. It's very close to what you are doing, but makes decisions much faster than you do because it isn't deciding to turn off a variation, merely to scale it down.