The trick of password rules should be (and this is what xkcd was getting at) to trick users into using passwords that they find easy to memorize, but are still high entropy. Long passwords can be high entropy, but, as highlighted at the end, they can just as easily be low entropy. If they needed to memorize it quickly, most users would pick a low entropy long password. Also, every users opinion of what makes a long password easy to memorize will be about the same - repeated letters, patterns on the keyboard, english words. This means that you can try to detect easy to memorize password, and flag them as bad. However, users will find something easy to memorize you don't block, and use that instead. This new thing will probably be common to all users as well. I once had someone inform a group that they should all use the first three letters of their names to overcome a password restriction. While the restriction probably assumed people would pick a random three letter string, they instead chose an even more obvious one then they would have otherwise.
When making password restrictions, try to force every user to come up with a different opinion of easy to memorize. For example, you could display pictures of ten foods, and ask users to pick their favourite food. Since eveyones likes different food, their choices would probably have a relatively even distribution. Moreover, upon returning, each user would reidentify the same food as their favourite. By extending this to a larger field of subjective values, you could make the overall datapoint high entropy, while maintaining memorizability.
When making password restrictions, try to force every user to come up with a different opinion of easy to memorize. For example, you could display pictures of ten foods, and ask users to pick their favourite food. Since eveyones likes different food, their choices would probably have a relatively even distribution. Moreover, upon returning, each user would reidentify the same food as their favourite. By extending this to a larger field of subjective values, you could make the overall datapoint high entropy, while maintaining memorizability.