Doesn't seem like you have proof of anything but it does appear that you have something that is very much like religious faith in an unforeseeable inevitability. Which is fine as far as religion is concerned but it's better to not pretend it's anything other than blind faith.
But if you really do have concrete proof of something then you'll have to spell it out better & explain how exactly it adds up to intelligence of such magnitude & scope that no one can make sense of it.
> "religious faith in an unforeseeable inevitability"
For reference, I work in academia, and my job is to find theoretical limitations of neural nets. If there was so much of a modicum of evidence to support the argument that "intelligence" cannot arise from sufficiently large systems, my colleagues and I would be utterly delighted and would be all over it.
Here are a couple of standard elements without getting into details:
1. Any "intelligent" agent can be modelled as a random map from environmental input to actions.
2. Any random map can be suitably well-approximated by a generative transformer. This is the universal approximation theorem. Universal approximation does not mean that models of a given class can be trained using data to achieve an arbitrary level of accuracy, however...
3. The neural scaling laws (first empirical, now more theoretically established under NTK-type assumptions), as a refinement of the double descent curve, assert that a neural network class can get arbitrarily close to an "entropy level" given sufficient scale. This theoretical level is so much smaller than any performance metric that humans can reach. Whether "sufficiently large" is outside of the range that is physically possible is a much longer discussion, but bets are that human levels are not out of reach (I don't like this, to be clear).
4. The nonlinearity of accuracy metrics comes from the fact that they are constructed from the intersection of a large number of weakly independent events. Think the CDF of a Beta random variable with parameters tending to infinity.
Look, I understand the scepticism, but from where I am, reality isn't leaning that way at the moment. I can't afford to think it isn't possible. I don't think you should either.
As I said previously, you are welcome to believe whatever you find most profitable for your circumstances but I don't find your heuristics convincing. If you do come up or stumble upon a concrete constructive proof that 100 trillion transistors in some suitable configuration will be sufficiently complex to be past the aforementioned event horizon then I'll admit your faith was not misplaced & I will reevaluate my reasons for remaining skeptical of Boolean arithmetic adding up to an incomprehensible kind of intelligence beyond anyone's understanding.
Which part was heuristic? This format doesn't lend itself to providing proofs, it isn't exactly a LaTeX environment. Also why does the proof need to be constructive? That seems like an arbitrarily high bar to me. It suggests that you are not even remotely open to the possibility of evidence either.
I also don't think you understand my point of view, and you mistake me for a grifter. Keeping the possibility open is not profitable for me, and it would be much more beneficial to believe what you do.
I didn't think you were a grifter but you only presented heuristics so if you have formal references then you can share them & people can decide on their own what to believe based on the evidence presented.
Fine, that's fair. I believe the statement that you made is countered by my claim, which is:
Theorem. For any tolerance epsilon > 0, there exists a transformer neural network of sufficient size that follows, up to the factor epsilon, the policy that most optimally achieves arbitrary goals in arbitrary stochastic environments.
Proof (sketch). For any stochastic environment with a given goal, there exists a model that maximizes expected return under this goal (not necessarily unique, but it exists). From Solomonoff's convergence theorem (Theorem 3.19 in [1]), Bayes-optimal predictors under the universal Kolmogorov prior converge with increasing context to this model. Consequently, there exists an agent (called the AIXI agent) that is Pareto-optimal for arbitrary goals (Theorem 5.23 in [1]). This agent is a sequence-to-sequence map with some mild regularity, and satisfies the conditions of Theorem 3 in [2]. From this universal approximation theorem (itself proven in Appendices B and C in [2]), there exists a transformer neural network of a sufficient size that replicates the AIXI agent up to the factor epsilon.
This is effectively the argument made in [3], although I'm not fond of their presentation. Now, practitioners still cry foul because existence doesn't guarantee a procedure to find this particular architecture (this is the constructive bit). This is where the neural scaling law comes in. The trick is to work with a linearization of the network, called the neural tangent kernel; it's existence is guaranteed from Theorem 7.2 of [4]. The NTK predictors are also universal and are a subset of the random feature models treated in [5], which derives the neural scaling laws for these models. Extrapolating these laws out as per [6] for specific tasks shows that the "floor" is always below human error rates, but this is still empirical because it works with the ill-defined definition of superintelligence that is "better than humans in all contexts".
[1] Hutter, M. (2005). Universal artificial intelligence: Sequential decisions based on algorithmic probability. Springer Science & Business Media.
Good question. It's because we don't need to be completely optimal in practice, only epsilon close to it. Optimality is undecidable, but epsilon close is not, and that's what the claim says that NNs can provide.
That doesn't address what I asked. The paper I linked proves undecidability for a much larger class of problems* which includes the case you're talking about of asymptotic optimality. In any case, I am certain you are unfamiliar w/ what I linked b/c I was also unaware of it until recently & was convinced by the standard arguments people use to convince themselves they can solve any & all problems w/ the proper policy optimization algorithm. Moreover, there is also the problem of catastrophic state avoidance even for asymptotically optimal agents: https://arxiv.org/abs/2006.03357v2.
* - Corollary 3.4. For any fixed ε, 0 < ε < 1, the following problem is undecidable: Given is a PFA M for which one of the two cases hold:
(1) the PFA accepts some string with probability greater than 1 − ε, or
(2) the PFA accepts no string with probability greater than ε.
Oh yes, that's one of the more recent papers from Hutter's group!
I don't believe there is a contradiction. AIXI is not computable and optimality is undecidable, this is true. "Asymptotic optimality" refers to behaviour for infinite time horizons. It does not refer to closeness to an optimal agent on a fixed time horizon. Naturally the claim that I made will break down in the infinite regime because the approximation rates do not scale with time well enough to guarantee closeness for all time under any suitable metric. Personally, I'm not interested in infinite time horizons and do not think it is an important criterion for "superintelligence" (we don't live in an infinite time horizon world after all) but that's a matter of philosophy, so feel free to disagree. I was admittedly sloppy with not explicitly stating that time horizons are considered finite, but that just comes from the choice of metric in the universal approximation which I have continued to be vague about. That also covers the Corollary 3.4, which is technically infinite time horizon (if I'm not mistaken) since the length of the string can be arbitrary.
But if you really do have concrete proof of something then you'll have to spell it out better & explain how exactly it adds up to intelligence of such magnitude & scope that no one can make sense of it.