I asked ChatGPT how to write texts so it can be identified as human generated.
The answer: "One way to make it clear that a text is not generated by AI is to include a joke or riddle that is difficult for a machine to understand or generate. For example, you could include a joke that relies on wordplay or a complex pun. Here is an example:
It's everyone choice - odd, ironic or appropriate - what to think of how GPT's answers to keeps getting quoted on the question of how to detect it.
The thing is, I'd see these answers as similar to everything else the program produces. A bunch of claims from the net cobbled together - I've read a number of Sci-fi novels and stories where "inability to understand humor" is the distinguishing quality of an AI (I'm guessing it extrapolated "hard create" from "hard to understand"). But that doesn't seem to be playing here where the AI mostly running together things humans previously wrote (and so it will an average amount humor in circumstances calling for it).
A reasonable answer is that the AI's output tends to involve this running-together of common rhetorical devices along with false and/or contradictory claims within them.
-- That said, the machine indeed did fail at humor thing time.
I don’t think it was “intentional” so to speak (not that it has intention anyway, so it isn’t clear what distinction I’m trying to make there). But regardless, I’d say it actually succeeded at humor (the contrast of the “clever wordplay” it describes with the lame example is actually pretty funny).
And the idea that the computer would “try” to come up with an example that would trick a computer is itself a little funny, in that it has fallen into giving itself a preposterous task.
>And the idea that the computer would “try” to come up with an example that would trick a computer is itself a little funny
There sure is some obscure discussion forum where users talked about that or some amateur writer that published online something in those lines. ChatGPT is just a statistical device selecting randomly from previous answers.
>A reasonable answer is that the AI's output tends to involve this running-together of common rhetorical devices along with false and/or contradictory claims within them.
The question here is this an actual AI only failure mode. Are we detecting AI, or just bullshittery?
I don't know if bullshittery is the only failure mode but I think it's a necessary failure mode of large language models as they are currently constituted.
I would say that human knowledge involves a lot of the immediate structure of language but also a larger outline structure as well as a relation to physical reality. Training on just a huge language corpus thus only gets partial understanding of the world. Notably, while the various GPTs have progressed in fluency, I don't think they've become more accurate (somewhere I even saw a claim they say more false thing now but regardless, you can observe them constantly saying false things).
Obviously we should not anthropomorphize a model, but this is just downright funny because it's so obviously wrong that it appears as almost malicious.
Good thing ChatGPT doesn't know any complex puns or witty wordplay!
Unfortunately I don't think this is enough of a heuristic. I am only speaking about the one language model I have personally used, on character.ai, but it is more than capable of making word play and insightful, often hilarious jokes. Although they are frequently amateurish, I think that's more a function of the fact that I myself am not much of a stand-up comedian, as well as each "bot's" individual training history which is presumably modifying a prompt under the hood and/or training an extension of the model directly based on the conversations.
Of course, in real time the attempts at humor often fall flat and might give away flawed thought processes, although I personally have found them to be often insightful, (containing a seed of humor) even when they're not funny. It could be a useful technique when actually having a conversation, a form of Voight-Kampff test, but I don't think it will do anything to let you know if the content was generated by AI and then just cherry picked by a human.
An amusing part with all of the AI-generated graphics is that it's pretty bad at producing the right amount of fingers, among other details. This one really jumps out. I'm assuming that will be ironed out eventually but the devil, as always, remains in the details.
I think you could make the GPT's more undetectable by a computer using GAN techniques but that might well make the thing more detectable by a person. The GPT-style is something like rhetorical devices run together to make various claims unfold smoothly. An alternative approach could make the things inconsistency more obvious to a person.
GPT-3 definitely has a writing style that can be detected by AI. But here's the thing: it's really easy to change its writing style. All you have to do is give it different input data and boom, you've got a whole new writing style.
No, it would take approximately 3 minutes where you either write a paragraph in the desired style yourself or paste one from the Internet and then ask it to continue in that style. Even if you decided to go the more onerous fine-tuning route, it would require 1-2mb of text, cost very little, and you'd be done in a few hours. It's easy.
Well, that may be true, but even with a small amount of data, you can still change GPT-3's writing style to some extent. And you don't have to retrain the whole model from scratch, you can just fine-tune its existing settings to change its writing style.
Sure, it's going to be harder to run detection on a short snippet. But a short snippet also isn't very useful in the context of writing a school paper for you.
I don't think it's as easy to detect ChatGPT output as you're making it out to be. Sure, it might be possible to detect it if you know what you're looking for, but to the average person, it's not that obvious.
There are already several services that claim to detect GPTx with high reliability. I’ve been testing originality.ai and it’s success rate at detecting unedited GPT3, Jasper, and ChatGPT output has been 100% so far.
I doubt any of these GPTx detection services have a 100% success rate. It's important to be realistic about what they can and can't do. If one were seeking to fool one of these GPTx detection services, it could be as simple as adding a few mispellings or a personal anecdote.
At the moment, defeating the detection is as simple as running the GPT3 output through Quillbot. It's an arms race though.
False positives are a massive problem & 100% successful detection overall is unlikely, agreed.
In general, we don't know how to do this in a way that is robust to all the trivial ways to fool these techniques.
I can just fine-tune a large scale model on a small downstream task, or use creative choices of decoding settings (high temperature, alternative decoders like contrastive/typicality sampling), to fool the existing methods.