There's a huge black art to interpreting data. It's not just confidence intervals and significance tests: you also need to watch very closely for any sources of bias in your data. Different user populations, unexpected feature interactions, bugs in your logging code, changes in the site midway through your experiment period, etc.