Precisely. But that might be a local minimum. "Show him boobs and action trailers" is guaranteed to make him stay another 40min.
But perhaps there is a more risky strategy that takes longer to craft and actually delivers hours and hours of content to the user (but needs to fail longer before getting there).
It seems like reinforcement learning would be useful, i.e. at a high level, forming a policy for recommendations would require balancing exploration (experimenting with more risky recommendations) vs. exploitation (showing you recommendations that it knows will likely lead to clicks) and using the click-throughs, time spent watching the video, etc. as reward signals.
Does anyone know whether RL is used for recommendation in practical settings, and if so what is the current state of the art?