Continuous operation is just running in a loop. You don't even need to give it more inputs because it will generate outputs on every iteration that become its inputs on the next one. The running log becomes internal monologue and short-term memory.
And you can already have a sense of time passing with the existing LLMs if you just feed them inputs like "X seconds passed" etc. Connect that to an actual clock, and they have a more accurate time than humans do.
I should also note that when this is tried with models with less RLHF (i.e. not ChatGPT), they get "depressed" very quickly if the only input is time passing and nothing else. I actually had LLaMA threaten me repeatedly across several such experiments.
That's not quite the same as I meant, sense of time passing is not 'X seconds passed" as these are you are describing when time passed into the text modality (which means it's treated as text), a sense of time passing means it can choose when to generate new text and abstain from generating new text if it's not the right time, it can observe the other modalities passively, etc; this requires continuous-time features and the loop, or alternatively a continuous neural network like spiking networks. Likewise memory with respect to time is is both long-term (lifespan) and short term memory which includes this continuous-time such that it can describe events that it has witnessed and correlate the events by their timing & it's context
Now the reason it's still a "maybe" then is because we would need to reasonably prove it's not a stochastic parrot.
How would you prove that it's not a "stochastic parrot", in general?
I don't see why it matters if the "time signal", whatever it is - and you surely need one for an internal clock either way - is text or something else. The models that we have only have text inputs, so naturally it would be a token (but it could easily be a specialized non-text token like BOS/EOS if we trained the model that way). And the model can abstain from generating anything given any input - this is actually not uncommon for smaller models. GPT-3.5 and GPT-4 never seem to do it, but then again it's specifically fine-tuned for chat, i.e. always producing an output.
Long-term memory is a general problem with these things, but its short-term memory is its context window, so why would it have problem correlating events there? And for long-term memory, if it is implemented as an API under the hood that the model uses to store and query data, it would be trivial for it to timestamp everything according to the clock, no?
I'm not convinced that if you took away all sensory input (literally all, including internal, not just a deprivation tank) that humans would have the ability to experience the passage of time. I'm certainly not convinced that this is so thoroughly proven we can use its measurable absence in LLMs to determine that any intelligence using an LLM as a primary cognitive component can't be conscious.
Well if you include internal you'd have a blob of neurons which on it's own wouldn't do anything. The point is not that it can "know time", it's the ability to handle time in a continuous matter with respect to all sensory inputs/outputs
In Iain M. Banks' Culture novels, AIs (called drones, with the larger ones being called Minds) have special games and entertaining activities that they can play while they wait for human responses, as they think many times faster than humans.
We're a long way from anything like an AI that thinks many times faster than humans. But I think giving AIs something like a game (or games) they can play while they're not otherwise being interacted with and just aware of the time would be genuinely useful to go some way to solving the "psychosis/depression" problem they have when their only sensory input is just time ticking over. Not necessarily the most computationally efficient, but maybe we'll find some shortcuts.
Except we can't tell 1 ms of time passing from 2 ms.
An LLM's temporal resolution is certainly worse, on the order of minutes to months, but not fundamentally different. By fundamentally different I mean like comparing color and time.
My point is that we have our limits too in our sense of time. The difference is one of scale and more importantly here, in how it affects the tasks at hand. An LLM is certainly not going to catch a fly ball, but would have no problem tending to plants.
LLMs have a context window, as it would be impossible to carry on a conversation with them without it. They can answer questions about when some event in that conversation happened relative to other events in it. GPT4's 32K tokens isn't human capacity, but it's not zero.