It hasn't read that riddle because it is a modified version. The model would in fact solve this trivially if it _didn't_ see the original in its training. That's the entire trick.
Recognizing that it is a riddle isn't impressive, true. But the duration of its reasoning is irrelevant, since the riddle works on misdirection. As I keep saying here, give someone uninitiated the 7 wives with 7 bags going (or not) to St Ives riddle and you'll see them reasoning for quite some time before they give you a wrong answer.
If you are tricked about the nature of the problem at the outset, then all reasoning does is drive you further in the wrong direction, making you solve the wrong problem.
Why does it exist 1000 times in the training if there isn't some trick to it, i.e. some subset of humans had to have answered it incorrectly for the meme to replicate that extensively in our collective knowledge.
And remember the LLM has already read a billion other things, and now needs to figure out - is this one of them tricky situations, or the straightforward ones? It also has to realize all the humans on forums and facebook answering the problem incorrectly are bad data.