"all FF3.5 jobs are from a long time ago, before we’d had much time to streamline and errorproof our system."
"This is the same graph as above, with the unfairly advantaged or disadvantaged browsers removed"
What? Why not take out all results from the period when your system was unstable, rather than removing the results that looked 'wrong' to you? Good grief.
The system was never very unstable. The error rates are all low. The point is that it's now much more stable.
The reason I didn't sort error rates by month was to preserve statistical significance. If we try to only look at error rates from specific periods of time, we have to decide how short those periods of time are. Too long and we don't have data from some of the browsers; too short and we don't have enough data to form a significant opinion. It may have been possible to find a middle ground, but why bother when we can still get unbiased results for 80% of the browsers?
I get what you're saying, but you still removed data that you thought was wrong in order to give the results you were expecting. Who's to say that the results for Chrome in that period were not also wrong?
"This is the same graph as above, with the unfairly advantaged or disadvantaged browsers removed"
What? Why not take out all results from the period when your system was unstable, rather than removing the results that looked 'wrong' to you? Good grief.