Some of those tools look great for doing A/B testing for websites but are there any guidelines out there on programming design patterns to implement A/B testing in generic software? I'm reluctant to litter my code with statements like:
if (userid % 2 == 0) {
//do test A logic
}else{
//do test B logic
}
You mean software not running on a server? Well, if you've got a Java enterprise programmer's love for design patterns, you could do A/B tests using a Strategy pattern. And if you wanted to decouple that from your code, you could have the Strategies created by a Factory. And if you configured your StrategyFactory in XML then you would never have any A/B testing code in your code at all... and this topic is giving me flashbacks so I'm going to stop now.
The nuts and bolts of doing this in downloadable software are not extraordinarily difficult. Pick a unique random identifier at install time, report random identifier with reports of conversion to the central server. (Passing it as a query parameter when folks open your website from within the app is so easy it is almost cheating. You can also ask for folks for a "hardware ID" to generate their license key, or something similar.)
See the presentation Paras linked to if you need implementation advice.
Frameworks like Patrick's A/Bingo and Vanity are specifically meant for integrating A/B testing within software. In fact, I remember a presentation by Patrick where he outlined how easy is it to adapt his library to other languages.
Ultimately that's what you have to do - you've got two different code paths, so there needs to be an if or function pointer in there somewhere to distinguish. As patio11 mentions, you can hide that with Strategy patterns, closures, frameworks, etc, but I'm not sure it buys you anything. You'd still need to retrofit your code to use those patterns, which may be more invasive than just using an if-statement. (Particularly since you want to remove the code if the experiment doesn't pan out, to avoid bloat.)
Another option is to branch your codebase and then proxy requests through to a new appserver running on the branch. This keeps the individual codebases simple, but merges suck - and if you don't stop development entirely, you'll need to be merging several times over the length of the experiment. (Also, this is one way experiments go wrong - a change to an unrelated feature can often have unexpected results on your data.) It's also a deployment pain if you're just a startup with a couple developers.