> Let's follow one example: Nigeria is the most populous country in Africa. In Abstract Wikipedia, this might be stored as: Z27243(Q1033, Q138758272, Q6256, Q15, Z27243K5)
Haha that's like John Wilkins' "Real Character, and a Philosophical Language"
It's not that different from how LLM tokens work, only in a tree structure as opposed to a plain sequence. Having a tree structure makes it easier to formally define rewrite rules (which is key for interpretability), as opposed to learning them from data as LLM do.
Also tokens don't represent meaning in themselves, but are assigned points in a multidimensional space, they can only represent meaning in the network as a whole when combined with other tokens in context and order.
And the abstract concepts of Abstract Wikipedia are human-defined, top-down ways of carving the world into distinct categories which make some kind of logical sense, whereas LLM's work bottom-up and create overlapping, non-hierarchical, probabilistic networks of connections with nearly no imposed structure except the principle that you shall know a token by the company it keeps.
But you can type them both out with keys on a keyboard so in that sense I guess they're not that different.
For context, this was proposed way back in 2013 (https://meta.wikimedia.org/wiki/Abstract_Wikipedia), when machine translation is just plain bad (and LLMs are only known in academic circles). Surprised that AWiki is now active though.
Haha that's like John Wilkins' "Real Character, and a Philosophical Language"
https://en.wikipedia.org/wiki/La_Ricerca_della_Lingua_Perfet... is a great intro to the weird and wonderful world of abstract/universal/ideal/a priori languages.
reply