Your statement doesn't contradict their statement. It produces text that is statistically likely to be arranged that way, but because we use non-deterministic sampling we get a wide variety of results that weren't necessarily in the training data.
No, I don't think they are talking about the decoding process there. If they were, it's a non statement. They are almost certainly talking about training data.