Im reading this outcome to be the reverse of the examples above. Or perhaps identifying the correct stat to use based on your goals.
In other words, in this case I don't really care what the national average is. I care about my house, my street, my area.
In other cases, like in marketing, the stat that matters first is overall net profit. From there we can burrow down to understand the factors. In which case we come across business share before marketing spend.
In the networking example, the goal is usage (throughput). Not speed or latency.
Drawing the wrong stat first leads to incorrect conclusions.
Right, one of the interesting things about Simpson's paradox is that there's not a uniform right answer: sometimes you care about the overall average, sometimes you care about the averages of subpopulations. You have to judge that based on the situation.
One of the other comments linked [1] which includes Judea Pearl's analysis of Simpson's paradox from a causal inference point of view [2], which lays this out nicely (though maybe not easy to understand--it took me many hours of study to get comfortable with Pearl's causal inference work, even with a strong stats background).
> like in marketing, the stat that matters first is overall net profit.
I have a take on that. The stat that matters is the profit per unit of non scalable business resource. As in how much management, marketing, sales, accounting, and engineering time does the product take per unit. It's important because those are often hard to scale. You can have a low margin product that requires zip of the above and it's good business. And the reverse, high margins but requires too much of the above and it's bad.
In other words, in this case I don't really care what the national average is. I care about my house, my street, my area.
In other cases, like in marketing, the stat that matters first is overall net profit. From there we can burrow down to understand the factors. In which case we come across business share before marketing spend.
In the networking example, the goal is usage (throughput). Not speed or latency.
Drawing the wrong stat first leads to incorrect conclusions.