Overfitting, curse of dimensionality, etc. As in many machine learning tasks, extracting good starting-point features is very very important, and that often requires domain-specific knowledge.
I think it should work on many other distributions. You'll probably need to do density estimation on you keys and partition your search space accordingly if the distribution is not known a priori.