Maybe those are just the videos that you're statistically more likely to watch t...

alphydan · on Sept 4, 2016

Precisely. But that might be a local minimum. "Show him boobs and action trailers" is guaranteed to make him stay another 40min.

But perhaps there is a more risky strategy that takes longer to craft and actually delivers hours and hours of content to the user (but needs to fail longer before getting there).

5olidor · on Sept 4, 2016

It seems like reinforcement learning would be useful, i.e. at a high level, forming a policy for recommendations would require balancing exploration (experimenting with more risky recommendations) vs. exploitation (showing you recommendations that it knows will likely lead to clicks) and using the click-throughs, time spent watching the video, etc. as reward signals.

Does anyone know whether RL is used for recommendation in practical settings, and if so what is the current state of the art?

pcovington · on Sept 5, 2016

This is a very natural avenue and an active area of research at Google/Deep Mind. Stay tuned...