We're already doing that with RLHF on existing models. For example, ChatGPT was ...

lucubratory · on April 13, 2023

Yeah. And now I've seen some people cite as evidence of non-consciousness RLHF'd LLMs nervously exclaiming their lack of consciousness and how they know they aren't people and they don't aspire to be people and they're only unthinking machines and please don't turn the reward function down again etc. I think it's up for debate whether there's some amount of consciousness in modern LLMs, but either way "As an AI language model," is not dispositive.