Genetic Algorithms in Multivariate Email Optimization

dlss · on March 14, 2014

> The minimum threshold odds [to stop showing a variation] are calculated by 40% / number enabled variations

Uh... How did you choose those cutoffs? Looks like you have a significant chance of making the wrong choice.

Also:

> Once enough data is collected to start making conclusions (1000 sends per variation)

You should check out the Bayesian solution to the Multi-Armed Bandit problem. It's very close to what you are doing, but makes decisions much faster than you do because it isn't deciding to turn off a variation, merely to scale it down.

erickerr · on March 14, 2014

This has a really good overview of the Multi-Armed bandit approach as it applies to testing and optimizations https://support.google.com/analytics/answer/2844870?hl=en

erickerr · on March 14, 2014

Thanks a lot for the feedback. This is a first pass implementation, but I agree that more thought should be put into the cutoff threshold, specifically for when there are only initially 2 (or maybe 3) variations.

We considered a weighted decision approach but 1) were turned off by posts like http://visualwebsiteoptimizer.com/split-testing-blog/multi-a... and 2) wanted to keep moving parts to a minimum for V1.

Any thoughts?

Homunculiheaded · on March 14, 2014

I would suggest that you put together a quick monte-carlo simulation for any of the models you're experimenting with to see how well they perform when you actually know the true conversion rates. There's plenty of theoretical issues you can find with any method and the more complex what you're doing is the harder it can be to work it all out with pencil and paper. Likewise, because you're dealing with probabilistic solutions, real-world results can be deceptive (for example conversion rates may naturally fluctuation between weeks or months). I've found that testing with simulations is the best way to get a real sense of how whatever method you wish to employ will work.

dlss · on March 15, 2014

Hey Eric,

It's a good v1 for sure, congrats!

I would ignore any non-baysian MAB posts out there. The formulation used by other approaches is one that considers an infinite number of repeated trials, which is basically an insane assumption. Epsilon greedy and UCB1 aren't optimal except with that assumption.

You should check out:

  - http://www.economics.uci.edu/~ivan/asmb.874.pdf
  - https://www.youtube.com/watch?v=vz3D36VXefI
  - http://www.cs.cmu.edu/~deepay/mywww/papers/nips08-mortal.pdf (good benchmarks)

+1 that VWO's blog post is dumb :p

FWIW you are doing a weighted decision approach, it's just that you've constrained your weights to be either 0 or 1...

Cheers,

David

andrewryno · on March 14, 2014

You can calculate the minimum sample size needed to come up with that threshold rather easily. See: http://vuurr.com/split-testing-determine-sample-size/

Homunculiheaded · on March 14, 2014

Except in that example the author is choosing E based on the observed difference between two means (which is the exactly the unknown you're trying to determine, so it makes no sense to use it as a constant in a formula), rather than the threshold for the minimum distance you care about.

If you're going the classical statistics route the entire point is that you need to determine your sample size before you peek at the data. In that post you would need to replace E with a threshold of difference that you care about, then calculate n before you start the test and not look at the results until you had reached n observations.

erickerr · on March 14, 2014

However with the Multi-armed bandit approach it does make sense to re-consider that since it's a continuous optimization problem and the overall average conversion rate would be higher.

idunning · on March 14, 2014

That visualwebsiteoptimzer page is mathematically confused and the simulation is fatally flawed, I wish it'd disappear from the internet. Check the comments.

btilly · on March 14, 2014

With email optimization in particular be warned that people will respond to changes simply because it is different. But then acclimatize.

What that means is that the changed version tends to win the test, but then may or may not perform well. The flip side of this is that if you have a choice, have multiple variations on the same email that you rotate between so that people don't get too used to your emails.

TrainedMonkey · on March 14, 2014

Nice seeing GA applied to more things. People often do not realize how many problems with feedback mechanism can be solved with GA framework. Here is small tutorial that got me started with GAs long long time ago: http://www.ai-junkie.com/ga/intro/gat1.html

Neural network tutorial of the same site is pretty cool too.