> Estimation theory makes a lot of sense - to me a lot more than pulling priors out of thin air.
You're "pulling priors out of thin air" whether you realize it or not; it's the only way that estimation makes sense mathematically. Frequentist statistics is broadly equivalent to Bayesian statistics with a flat prior distribution over the parameters, and what expectations correspond to a "flat" distribution ultimately depends on how the model is parameterized, which is in principle an arbitrary choice - something that's being "pulled out of thin air". Of course, Bayesian statistics also often involves assigning "uninformative" priors out of pure convenience, and frequentists can use "robust" statistical methods to exceptionally take prior information into account; so the difference is even lower than you might expect.
There's also a strong argument against NHST specifically that works from both a frequentist and a Bayesian perspective: NHST rejects the Likelihood principle https://en.wikipedia.org/wiki/Likelihood_principle hence one could even ask whether NHST is even "properly" frequentist.
> You're "pulling priors out of thin air" whether you realize it or not
No, you are not. That’s an argument I often seen put forward by people who want the Bayesian approach to be the one true approach. There are no prior whatsoever involved in a frequentist analysis.
People who say that generally refer to MLE being somewhat equivalent to MAP estimation with a uniform prior in the region. That’s true but that’s the usual mistake I’m complaining about of reducing estimators to MLE.
The assertion in itself doesn’t make sense.
> Of course, Bayesian statistics also often involves assigning "uninformative" priors out of pure convenience
That’s very hand wavy. The issue is that priors have a significant impact on posteriors, one which is often deeply misunderstood by casual statisticians.
Frequentists big complaint about priors are that they are subjective and influence the conclusions of the study. But the Frequentist approach is equivalent to using a non-informative prior, which is itself a subjective prior that influences the conclusions of the study. It is making the assumption that we know literally nothing about the phenomenon under examination outside of the collected data, which is almost never true.
> There are no prior whatsoever involved in a frequentist analysis.
It may not be everywhere, but even in the simplest case of NHST, there certainly is. It assumes no difference between H0 and H1. And NHST is basically the topic of this entire thread: it's what we should have stopped teaching a long time ago.
Let’s say you run the most basic regression Y = X beta + epsilon. The X is chosen out of the set all possible regressors Z (say you run income ~ age + sex, where you also could have used education, location, whatever).
Is that not equivalent to a prior that the coefficient on variables in Z but not in X is zero?
You're "pulling priors out of thin air" whether you realize it or not; it's the only way that estimation makes sense mathematically. Frequentist statistics is broadly equivalent to Bayesian statistics with a flat prior distribution over the parameters, and what expectations correspond to a "flat" distribution ultimately depends on how the model is parameterized, which is in principle an arbitrary choice - something that's being "pulled out of thin air". Of course, Bayesian statistics also often involves assigning "uninformative" priors out of pure convenience, and frequentists can use "robust" statistical methods to exceptionally take prior information into account; so the difference is even lower than you might expect.
There's also a strong argument against NHST specifically that works from both a frequentist and a Bayesian perspective: NHST rejects the Likelihood principle https://en.wikipedia.org/wiki/Likelihood_principle hence one could even ask whether NHST is even "properly" frequentist.