the author is either glib or autistic enough to opine on a clean desk as beyond reproach , i give him the benefit of the doubt and distill the idea that an empty context window offers more room
Bold of you to assume this is a quick fix. How many software projects have you worked on that went from a buggy poorly optimized mess into a streamlined efficient system? I can think of exactly 0 from personal experience, all the ones I’ve worked on that were performant at the end had that in mind from their inception.
Given the announcement from a few days ago of google trying to get external investment, this is their follow up, showing what that investment is good for. Also, it’s pretty light on details that are of much use to competitors. “We made an accurate simulation system to test our system in before deployment” would be pretty mundane if you were talking about any other field of engineering.
There have been advances recently (last year) in scaling deep rl by a significant amount, their announcement is in line with a timeline of running enough experiments to figure out how to leverage that in post training.
Importantly, this isn’t just throwing more data at the problem in an unstructured way, afaik companies are getting as many got histories as they can and doing something along the lines of, get an llm to checkpoint pull requests, features etc and convert those into plausible input prompts, then run deep rl with something which passes the acceptance criteria / tests as the reward signal.
So, the hidden mental model that the OP is expressing and failed to elucidate on is that llm’s can be thought of as compressing related concepts into approximately orthogonal subspaces of the vector space that is upper bounded by the superposition of all of their weights. Since training has the effect of compressing knowledge into subspaces, a necessary corollary of that fact is that there are now regions within the vector space that contain nothing very much. Those are the valleys that need to be tunneled through, ie the model needs to activate disparate regions of its knowledge manifold simultaneously, which, seems like it might be difficult to do. I’m not sure if this is a good way of looking at things though, because inference isn’t topology and I’m not sure that abstract reasoning can be reduced down to finding ways to connect concepts that have been learned in isolation.
You are who, what and where you are by virtue of historical accident. The serfs of the Middle Ages sure as shit didn’t want that economy either, nor did the unwashed masses, dying at such an alarming rate of dysentery and cholera that the population of major cities was only sustained through mass migration during industrialization in the 18th and 19th centuries. Nor did more or less any slave during human history.
The last ~150 years of economic freedom and prosperity for a large percentage of the population that has been wide spread across industrialized economies is the exception, not the rule, and the difference is and always has been the fact that industrialized economies have required large amounts of skilled specialists to make the systems they developed work.
If you change that calculus, there is no historical precedent to assume that human societies won’t revert to the mean.
idk I feel like I read a few things from history about people rejecting the economy that was forced on them and demanding better, maybe it was AI generated?
I intensely dislike the authors smug self satisfied sense of superiority.
reply