It will almost always be detectable if you just read what is written. Especially for academic work. It doesn’t know what a citation is, only what one looks like and where they appear. It can’t summarise a paper accurately. It’s easy to force laughably bad output by just asking the right sort of question.
The simplest approach for setting homework is to give them the LLM output and get them to check it for errors and omissions. LLMs can’t critique their own work and students probably learn more from chasing down errors than filling a blank sheet of paper for the sake of it.
People seem to grasp onto weaknesses AI has now and say that they will have them forever, like how text AI lies, and image generation AI can’t draw hands.
But these AIs are advancing unimaginably quick, 2 years ago generated text was pretty bad, becoming pretty incoherent, and 1 year ago generated images were mostly strange mush.
Spot on! Actually people still talk about hands but it’s already been solved with many newer image gen models… The hands they produce look perfectly fine usually these days.
Some things are inherent in the way the current LLM’s work. It doesn’t reason, it doesn’t understand, it just predicts the next word out of likely candidates based on the previous words. It can’t look ahead to know if it’s got an answer, and it can’t backtrack to change previous words if it later finds out it’s written itself into a corner. It won’t even know it’s written itself into a corner, it will just continue predicting in the pattern it’s seen, even if it makes little or no sense for a human.
It just mimics the source data it’s been trained on, following the patterns it’s learned there. At no point does it have any sort of understanding of what it’s saying. In some ways it’s similar to this, where a man learned how enough french words were written to win the national scrabble competition, without any clue what the words actually mean.
And until we get a new approach to LLM’s, we can only improve it by adding more training data and more layers allowing it to pick out more subtle patterns in larger amounts of data. But with the current approach, you can’t guarantee that what it writes will be correct, or even make sense.
This is not entirely correct, in my experience. With the current version pf gtp-4 you might be right, but the initial versions were extremely good. Clearly you have to work with it, you cannot ask for the whole work
I meant initial versions of chatGTP 4. ChatGTP isn’t lying, simply because lying implies a malevolent intent. Gtp-4 has no intent, it just provides an output given an input, that can be either wrong or correct. A model able to provide more correct answers is a more accurate model. Computing accuracy for a LLM is not trivial, but gpt-4 is still a good model. User has to know how to use it, what to expect and how to evaluate the result. If they are unable to do so it’s completely their fault.
What you are describing is true of older LLMs. GPT4, it’s less true of. GPT5 or whatever it is they are training now will likely begin to shed these issues.
The shocking thing that we discovered that lead to all of this is that this sort of LLM continues to scale in capabilities with the quality and size of the training set. AI researchers were convinced that this was not possible until GPT proved that it was.
So the idea that you can look at the limitations of the current generation of LLM and make blanket statements about the limitations of all future generations is demonstrably flawed.
They cannot be anything other than stochastic parrots because that is all the technology allows them to be. They are not intelligent, they don’t understand the question you ask or the answer they give you, they don’t know what truth is let alone how to determine it. They’re just good at producing answers that sound like a human might have written them. They’re a parlour trick. Hi-tech magic 8balls.
I think there’s a big difference between being able to identify an AI by talking to it and being able to identify something written by an AI, especially if a human has looked over it for obvious errors.
I’m no GPT booster, but I think that the real problem with detectability here
It will almost always be detectable if you just read what is written. Especially for academic work.
is that it requires you to know the subject and content already, and to be giving the paper a relatively detailed reading. For a rube reading the paper, trying to learn from it - a lot of GPT content is easily mistaken as legitimate. And it’s getting better. We’re not safe simply assuming that AI today is as good as it will ever get and the clear errors we can detect cannot ever be addressed.
Penetrating academic writing, for academics, is probably one of the highest barriers of any writing task, AI or not.
But being dismissive of the threat of AI content because it’s not able to convincingly fake some of the hardest writing that real people do is maybe sidestepping a lot of much more casual writing - that still carries significance and consequence.
It will almost always be detectable if you just read what is written. Especially for academic work. It doesn’t know what a citation is, only what one looks like and where they appear. It can’t summarise a paper accurately. It’s easy to force laughably bad output by just asking the right sort of question.
The simplest approach for setting homework is to give them the LLM output and get them to check it for errors and omissions. LLMs can’t critique their own work and students probably learn more from chasing down errors than filling a blank sheet of paper for the sake of it.
given how much AI has advanced in the past year alone, saying it will “always” be easy to spot is extremely short sighted.
People seem to grasp onto weaknesses AI has now and say that they will have them forever, like how text AI lies, and image generation AI can’t draw hands.
But these AIs are advancing unimaginably quick, 2 years ago generated text was pretty bad, becoming pretty incoherent, and 1 year ago generated images were mostly strange mush.
Spot on! Actually people still talk about hands but it’s already been solved with many newer image gen models… The hands they produce look perfectly fine usually these days.
Some things are inherent in the way the current LLM’s work. It doesn’t reason, it doesn’t understand, it just predicts the next word out of likely candidates based on the previous words. It can’t look ahead to know if it’s got an answer, and it can’t backtrack to change previous words if it later finds out it’s written itself into a corner. It won’t even know it’s written itself into a corner, it will just continue predicting in the pattern it’s seen, even if it makes little or no sense for a human.
It just mimics the source data it’s been trained on, following the patterns it’s learned there. At no point does it have any sort of understanding of what it’s saying. In some ways it’s similar to this, where a man learned how enough french words were written to win the national scrabble competition, without any clue what the words actually mean.
And until we get a new approach to LLM’s, we can only improve it by adding more training data and more layers allowing it to pick out more subtle patterns in larger amounts of data. But with the current approach, you can’t guarantee that what it writes will be correct, or even make sense.
This is not entirely correct, in my experience. With the current version pf gtp-4 you might be right, but the initial versions were extremely good. Clearly you have to work with it, you cannot ask for the whole work
That’s not true! There’s heaps of early-GPT articles pointing out how much bullshit it regurgitates (eg Why does ChatGPT constantly lie?). And no evidence at all that the breathless fanboys have even stopped to check.
I meant initial versions of chatGTP 4. ChatGTP isn’t lying, simply because lying implies a malevolent intent. Gtp-4 has no intent, it just provides an output given an input, that can be either wrong or correct. A model able to provide more correct answers is a more accurate model. Computing accuracy for a LLM is not trivial, but gpt-4 is still a good model. User has to know how to use it, what to expect and how to evaluate the result. If they are unable to do so it’s completely their fault.
Why are you so pissed of a good nlp model?
What you are describing is true of older LLMs. GPT4, it’s less true of. GPT5 or whatever it is they are training now will likely begin to shed these issues.
The shocking thing that we discovered that lead to all of this is that this sort of LLM continues to scale in capabilities with the quality and size of the training set. AI researchers were convinced that this was not possible until GPT proved that it was.
So the idea that you can look at the limitations of the current generation of LLM and make blanket statements about the limitations of all future generations is demonstrably flawed.
They cannot be anything other than stochastic parrots because that is all the technology allows them to be. They are not intelligent, they don’t understand the question you ask or the answer they give you, they don’t know what truth is let alone how to determine it. They’re just good at producing answers that sound like a human might have written them. They’re a parlour trick. Hi-tech magic 8balls.
Are you referring to humans or AI? I’m not sure you’re wrong about humans…
FFS
Sam Altman is a know-nothing grifter. HTH
That article is super helpful.
Thanks!
I think there’s a big difference between being able to identify an AI by talking to it and being able to identify something written by an AI, especially if a human has looked over it for obvious errors.
I’m no GPT booster, but I think that the real problem with detectability here
is that it requires you to know the subject and content already, and to be giving the paper a relatively detailed reading. For a rube reading the paper, trying to learn from it - a lot of GPT content is easily mistaken as legitimate. And it’s getting better. We’re not safe simply assuming that AI today is as good as it will ever get and the clear errors we can detect cannot ever be addressed.
Penetrating academic writing, for academics, is probably one of the highest barriers of any writing task, AI or not.
But being dismissive of the threat of AI content because it’s not able to convincingly fake some of the hardest writing that real people do is maybe sidestepping a lot of much more casual writing - that still carries significance and consequence.
Chad comment right here…