The fact that the 95% confidence intervals of two variables have some overlap doesn't mean there's a >5% chance that the expected values of the two variables are the same.
Consider two independent random variables X and Y; the chance that (a sample from X is above the 90th percentile of the true distribution of X) is 10%, but the chance that (a sample of X is above the 90th percentile of the true distribution of X AND a sample of Y is below the 10th percentile of the true distribution of Y) is 1%.
(disclaimer: with actual science the stats are a lot more complicated and you can't just assume they're independent and multiply the two, it's just a simplified example to give intuition about why overlapping confidence intervals don't imply what the parent thought, IANAstatistician)
Overlapping confidence intervals does not mean > x% chance that the two variables' expected values are the same. If the intervals overlap, the difference is not statistically significant.
Your example about random variables is largely misinformed. You're talking about things as if they are individual values. But we're talking about sample means. The probability that a sample mean for a large sample is above the 90th percentile is massively lower than 10%, and depends on n. The joint probability of getting two sample means above X threshold is irrelevant.
Confidence intervals don't tell you what the probability of the true mean being above X is. They tell you, bluntly, the range of values where the true mean could be, with 95% confidence ("If i were to do this experiment 100 times, based on the results I got, I would expect the true mean to be within this range")
You can play with some numbers and methods but you can rest pretty sure that a material effect size is probably not rigorously evidenced if the intervals overlap
> If the intervals overlap, the difference is not statistically significant.
Demonstrably false. Obvious counterexample: the study in the OP, which has overlapping confidence intervals and a statistically significant difference.
Proof: just calculate the 95% confidence interval for the difference between the two means. You can figure out what the stddev was from half the confidence interval divided by the z-score for a 95% confidence interval, 1.96, and you get 1.02 and 1.30 for the two groups. Then the confidence interval is:
(10.4 - 6.3) +/- 1.96*sqrt(1.02^2 + 1.30^2)
gives [0.86, 7.34].
This does not include 0, therefore the difference is significant.
> The probability that a sample mean for a large sample is above the 90th percentile is massively lower than 10%, and depends on n.
I was trying to give a basic intuition about normal distributions with a simple example, the distribution of one sample is a simpler example of a different normal distribution. Yes obviously the distribution of an estimate of X given lots of samples is not the same as the distribution of a single sample, I never claimed it was.
> You can figure out what the stddev was from half the confidence interval divided by the z-score for a 95% confidence interval, 1.96, and you get 1.02 and 1.30 for the two groups.
I'm not really interested in double checking your math, but you cannot derive the standard deviation of a sample mean confidence interval without considering the sample size. You seem to be making the same mistake again, confusing the Z score of a single value vs. the Z score of a sample mean. The standard deviation is of course going to be much larger. Why? Because you're actually looking at a difference of proportions where the values are either 1 or 0. The standard deviation is of course going to be much larger than 1%.
Ignoring that and assuming you meant to say standard error, where your math appears to work at a glance; in general, sure, overlapping confidence intervals don't mean that statistical tests of mean difference won't be significant. But... if you don't have that your effect size is probably pretty small. I would not put a lot of faith on these particular results as strong evidence of anything.
I would advocate for people to just look for overlapping curves.
> Yes obviously the distribution of an estimate of X given lots of samples is not the same as the distribution of a single sample, I never claimed it was.