I don't know about strictly superior. It's certainly strictly easier for people with a budget, who just need "good enough" results the first try. I don't have any evidence whatsoever, but I'd expect that enough tuning and retries can get squeeze a bit more performance out of RLHF than you can get out of DPO.