They are for all the reasons you explain.
Here's a wonderful paper (which I probably found here on HN) on the subject: On the Measure of Intelligence, by Francois Chollet (working for Google), from 2019:
https://arxiv.org/abs/1911.01547
He comes up with many tests which are relatively simple for humans that AI models cannot solve.
If you don't want to read all the paper, jump to the figures (starting from page 48).
The goal, of course, is not to train an AI on the correct results of the exact problems in that paper: that'd just be more memory testing.
Once Claude solves that, things are going to get really wild!
They are for all the reasons you explain.
Here's a wonderful paper (which I probably found here on HN) on the subject: On the Measure of Intelligence, by Francois Chollet (working for Google), from 2019:
https://arxiv.org/abs/1911.01547
He comes up with many tests which are relatively simple for humans that AI models cannot solve.
If you don't want to read all the paper, jump to the figures (starting from page 48).
The goal, of course, is not to train an AI on the correct results of the exact problems in that paper: that'd just be more memory testing.
Once Claude solves that, things are going to get really wild!