It's been years since I used CMU Sphinx, but don't you have to bring your own training data? Sure, there are free data sets out there, and pre-trained models, but they are not as good as what Google et al. have.
Yes. And it's not just the data sets, its the fundamental technology. You really need state-of-the-art, ie LSTM/CTC to deal with noisy input data and to get to 99% accuracy (in addition to excellent data sets of course)