It does not, at least not explicitly, which is a distinguishing feature of polic...

It does not, at least not explicitly, which is a distinguishing feature of policy optimization algorithms. I don't think there's anything about this that makes it worse for non-deterministic problems. Note that the policy can still be stochastic, if desired (not sure if that's a good idea in general). A nice feature of black box policy optimization is that it's almost trivial to apply to either stochastic or deterministic problems.