The research from Purdue University, first spotted by news outlet Futurism, was presented earlier this month at the Computer-Human Interaction Conference in Hawaii and looked at 517 programming questions on Stack Overflow that were then fed to ChatGPT.
“Our analysis shows that 52% of ChatGPT answers contain incorrect information and 77% are verbose,” the new study explained. “Nonetheless, our user study participants still preferred ChatGPT answers 35% of the time due to their comprehensiveness and well-articulated language style.”
Disturbingly, programmers in the study didn’t always catch the mistakes being produced by the AI chatbot.
“However, they also overlooked the misinformation in the ChatGPT answers 39% of the time,” according to the study. “This implies the need to counter misinformation in ChatGPT answers to programming questions and raise awareness of the risks associated with seemingly correct answers.”
Who would have thought that an artificial intelligence trained on human intelligence would be just as dumb
Hm. This is what I got.
I think about 90% of the screenshots we see of LLMs failing hilariously are doctored. Lemmy users really want to believe it’s that bad through.
Edit:
I’ve had lots of great experiences with ChatGPT, and I’ve also had it hallucinate things.
I saw someone post an image of a simplified riddle, where ChatGPT tried to solve it as if it were the entire riddle, but it added extra restrictions and have a confusing response. I tried it for myself and got an even better answer.
Prompt (no prior context except saying I have a riddle for it):
A man and a goat are on one side of the river. They have a boat. How can they go across?
Response:
The man takes the goat across the river first, then he returns alone and takes the boat across again. Finally, he brings the goat’s friend, Mr. Cabbage, across the river.
I wish I was witty enough to make this up.
I reproduced that one and so I believe that one is true.
I looked up the whole riddle and see how it got confused.
It happened on 3.5 but not 4.
Interesting! What did 4 say?
Evidently I didn’t save the conversation but I went ahead and entered the exact prompt above into GPT-4. It responded with:
The man can take the goat across the river in the boat. After reaching the other side, he can leave the goat and return alone to the starting side if needed. This solution assumes the boat is capable of carrying at least the man and the goat at the same time. If there are no further constraints like a need to transport additional items or animals, this straightforward approach should work just fine!
Thanks for sharing!
Yesterday, someone posted a doctored one on here saying everyone eats it up even if you use a ridiculous font in your poorly doctored photo. People who want to believe are quite easy to fool.
Or you missed the point that this was a joke?
Holy fuck did it just pass the Turing test?
I actually had the opposite the other day where the code only broke on my machine…
In the end I just commented it out. I don’t see why everybody else should have working code and not me.
“Major new Technology still in Infancy Needs Improvements”
– headline every fucking day
“Corporation using immature technology in productions because it’s cool”
More news at eleven
This is scary because up to now, all software released worked exactly as intended so we need to be extra special careful here.
Yes, and we never have and never will put lives in the hands of software developers before!
Tap for spoiler
/s…for this comment and the above one, for anyone who needs it
unready technology that spews dangerous misinformation in the most convincing way possible is being massively promoted
Yeah, because no human would convincingly lie on the internet. Right, Arthur?
It’s literally built on what confidently incorrect people put on the internet. The only difference is that there are constant disclaimers on it saying it may give incorrect information.
Anyone too stupid to understand how to use it is too stupid to use the internet safely anyways. Or even books for that matter.
Holy mother of false equivalence. Google is not supposed to be a random dude on the Internet, it’s supposed to be a reference tool, and for the most part it was a good one before they started enshittifying it.
Google is a search engine. It points you to web pages that are made by people. Many times, the people who make those websites have put things on them that are knowingly or unknowingly incorrect but said in an authoritative manner. That was all I was saying, nothing controversial. That’s been a known fact for a long time. You can’t just read something on a single site and then be sure that it has to be true. I get that there are people who strangely fall in love with specific websites and think they are absolute truth, but thats always been a foolish way to use the internet.
A great example of people believing blindly is all these horribly doctored google ai images saying ridiculous things. There are so many idiots that think every time they see a screenshot of Google ai saying something absurd that it has to be true. People have even gone so far as to use ridiculous fonts just to point out how easy it is to get people to trust anything. Now there’s a bunch of idiots that think all 20 or so Google ai mistakes they’ve seen are all genuine, so much so that they think almost all Google ai responses are incorrect. Some people are very stupid. Sorry to break it to you, but LLMs are not the first thing to put incorrect information on the internet.
“Will this technology save us from ourselves, or are we just jerking off?”
You have no idea how many times I mentioned this observation from my own experience and people attacked me like I called their baby ugly
ChatGPT in its current form is good help, but nowhere ready to actually replace anyone
A lot of firms are trying to outsource their dev work overseas to communities of non-English speakers, and then handing the result off to a tiny support team.
ChatGPT lets the cheap low skill workers churn out miles of spaghetti code in short order, creating the illusion of efficiency for people who don’t know (or care) what they’re buying.
Yeap… Another brilliant short term strategy to catch a few eager fools that won’t last mid term
The compiler is such a racist boomer it won’t make my program.
GPT-2 came out a little more than 5 years ago, it answered 0% of questions accurately and couldn’t string a sentence together.
GPT-3 came out a little less than 4 years ago and was kind of a neat party trick, but I’m pretty sure answered ~0% of programming questions correctly.
GPT-4 came out a little less than 2 years ago and can answer 48% of programming questions accurately.
I’m not talking about mortality, or creativity, or good/bad for humanity, but if you don’t see a trajectory here, I don’t know what to tell you.
Seeing the trajectory is not ultimate answer to anything.
Perhaps there is some line between assuming infinite growth and declaring that this technology that is not quite good enough right now will therefore never be good enough?
Blindly assuming no further technological advancements seems equally as foolish to me as assuming perpetual exponential growth. Ironically, our ability to extrapolate from limited information is a huge part of human intelligence that AI hasn’t solved yet.
will therefore never be good enough?
no one said that. but someone did try to reject the fact it is demonstrably bad right now, because “there is a trajectory”.
I appreciate the XKCD comic, but I think you’re exaggerating that other commenter’s intent.
The tech has been improving, and there’s no obvious reason to assume that we’ve reached the peak already. Nor is the other commenter saying we went from 0 to 1 and so now we’re going to see something 400x as good.
I think the one argument for the assumption that we’re near peak already is the entire issue of AI learning from AI input. I think numberphile discussed a maths paper that said that to achieve the accuracy that we want, there is simply not enough data to train it on.
That’s of course not to say that we can’t find alternative approaches
We’re close to peak using current NN architectures and methods. All this started with the discovery of transformer architecture in 2017. Advances in architecture and methods have been fairly small and incremental since then. The advancements in performance has mostly just been throwing more data and compute at the models, and diminishing returns have been observed. GPT-3 costed something like $15 million to train. GPT-4 is a little better and costed something like $100 million to train. If the next model costs $1 billion to train, it will likely be a little better.
I appreciate the XKCD comic, but I think you’re exaggerating that other commenter’s intent.
i don’t think so. the other commenter clearly rejects the critic(1) and implies that existence of upward trajectory means it will one day overcome the problem(2).
while (1) is well documented fact right now, (2) is just wishful thinking right now.
hence the comic, because “the trajectory” doesn’t really mean anything.
In general, “The technology is young and will get better with time” is not just a reasonable argument, but almost a consistent pattern. Note that XKCD’s example is about events, not technology. The comic would be relevant if someone were talking about events happening, or something like sales, but not about technology.
Here, I’m not saying that you’re necessarily right or they’re necessarily wrong, just that the comic you shared is not a good fit.
In general, “The technology is young and will get better with time” is not just a reasonable argument, but almost a consistent pattern. Note that XKCD’s example is about events, not technology.
yeah, no.
try to compare horse speed with ford t and blindly extrapolate that into the future. look at the moore’s law. technology does not just grow upwards if you give it enough time, most of it has some kind of limit.
and it is not out of realm of possibility that llms, having already stolen all of human knowledge from the internet, having found it is not enough and spewing out bullshit as a result of that monumental theft, have already reached it.
that may not be the case for every machine learning tool developed for some specific purpose, but blind assumption it will just grow indiscriminately, because “there is a trend”, is overly optimistic.
I don’t think continuing further would be fruitful. I imagine your stance is heavily influenced by your opposition to, or dislike of, AI/LLMs
oh sure. when someone says “you can’t just blindly extrapolate a curve”, there must be some conspiracy behind it, it absolutely cannot be because you can’t just blindly extrapolate a curve 😂
That comes off as disingenuous in this instance.
The study is using 3.5, not version 4.
4 produces inaccurate programming answers too
Obviously. But it is FAR better yet again.
Not really. I ask it questions all the time and it makes shit up.
Yes. But it is better than 3.5 without any doubt.
In what year do you estimating AI will have 90% accuracy?
No clue? Somewhere between a few years (assuming some unexpected breakthrough) or many decades? The consensus from experts (of which I am not) seems to be somewhere in the 2030s/40s for AGI. I’m guessing accuracy probably will be more on a topic by topic basis, LLMs might never even get there, or only related to things they’ve been heavily trained on. If predictive text doesn’t do it then I would be betting on whatever Yann LeCun is working on.
Given the data points you made up, I feel it’s safe to assume that this plateau will now be a 10 year stretch
Lemmy seems to be very near-sighted when it comes to the exponential curve of AI progress, I think this is an effect because the community is very anti-corp
We only need to keep doing incremental improvements in the technology and avoid destroying ourselves in the meantime. That’s all it takes for us to find ourselves in the presence of superintelligent AI one day.
My experience with an AI coding tool today.
Me: Can you optimize this method.
AI: Okay, here’s an optimized method.
Me seeing the AI completely removed a critical conditional check.
Me: Hey, you completely removed this check with variable xyz
Ai: oops you’re right, here you go I fixed it.
It did this 3 times on 3 different optimization requests.
It was 0 for 3
Although there was some good suggestions in the suggestions once you get past the blatant first error
My favorite is when I ask for something and it gets stuck in a loop, pasting the same comment over and over
I will resort to ChatGPT for coding help every so often. I’m a fairly experienced programmer, so my questions usually tend to be somewhat complex. I’ve found that’s it’s extremely useful for those problems that fall into the category of “I could solve this myself in 2 hours, or I could ask AI to solve it for me in seconds.” Usually, I’ll get a working solution, but almost every single time, it’s not a good solution. It provides a great starting-off point to write my own code.
Some of the issues I’ve found (speaking as a C++ developer) are: Variables not declared “const,” extremely inefficient use of data structures, ignoring modern language features, ignoring parallelism, using an improper data type, etc.
ChatGPT is great for generating ideas, but it’s going to be a while before it can actually replace a human developer. Producing code that works isn’t hard; producing code that’s good requires experience.
This has been my experience as well. If you already know what you are doing, LLMs can be a great tool. If you are inexperienced, you cannot assess the quality nor the accuracy of the answers, and are using the LLM to replace your own learning.
I like to draw the parallel to people that have learnt to paint only using digital tools. They often show a particular colouring that shows a lack of understanding of colour theory. Because pipette tools mean that you never have to mix colours, you never have to learn to do so. Painting with physical paint isn’t superior, but it presents a hurdle (mixing paint) that is crucial to learn to overcome. Many digital-only artists will still have learnt on traditional media. Once you have the knowledge, the pipette and colour pickers are just a tool, no longer inhibiting anything.
Yeah it’s wrong a lot but as a developer, damn it’s useful. I use Gemini for asking questions and Copilot in my IDE personally, and it’s really good at doing mundane text editing bullshit quickly and writing boilerplate, which is a massive time saver. Gemini has at least pointed me in the right direction with quite obscure issues or helped pinpoint the cause of hidden bugs many times. I treat it like an intelligent rubber duck rather than expecting it to just solve everything for me outright.
Same here. It’s good for writing your basic unit tests, and the explain feature is useful getting for getting your head wrapped around complex syntax, especially as bad as searching for useful documentation has gotten on Google and ddg.
That’s a good way to use it. Like every technological evolution it comes with risks and downsides. But if you are aware of that and know how to use it, it can be a useful tool.
And as always, it only gets better over time. One day we will probably rely more heavily on such AI tools, so it’s a good idea to adapt quickly.
ChatGPT and github copilot are great tools, but they’re like a chainsaw: if you apply them incorrectly or become too casual and careless with them, they will kickback at you and fuck your day up.
If you don’t know what you are doing, and you give it a vague request hoping it will automatically solve your problem, then you will just have to spend even more time to debug its given code.
However, if you know exactly what needs do do, and give it a good prompt, then it will reward you with a very well written code, clean implementation and comments. Consider it an intern or junior developer.
Example of bad prompt: My code won’t work [paste the code], I keep having this error [paste the error log], please help me
Example of (reasonably) good prompt: This code introduces deep recursion and can sometimes cause a “maximum stack size exceeded” error in certain cases. Please help me convert it to use a
while
loop instead.I wouldn’t trust an LLM to produce any kind of programming answer. If you’re skilled enough to know it’s wrong, then you should do it yourself, if you’re not, then you shouldn’t be using it.
I’ve seen plenty of examples of specific, clear, simple prompts that an LLM absolutely butchered by using libraries, functions, classes, and APIs that don’t exist. Likewise with code analysis where it invented bugs that literally did not exist in the actual code.
LLMs don’t have a holistic understanding of anything—they’re your non-programming, but over-confident, friend that’s trying to convey the results of a Google search on low-level memory management in C++.
If you’re skilled enough to know it’s wrong, then you should do it yourself, if you’re not, then you shouldn’t be using it.
Oh I strongly disagree. I’ve been building software for 30 years. I use copilot in vscode and it writes so much of the tedious code and comments for me. Really saves me a lot of time, allowing me to spend more time on the complicated bits.
I’m closing in on 30 years too, started just around '95, and I have yet to see an LLM spit out anything useful that I would actually feel comfortable committing to a project. Usually you end up having to spend as much time—if not more—double-checking and correcting the LLM’s output as you would writing the code yourself. (Full disclosure: I haven’t tried Copilot, so it’s possible that it’s different from Bard/Gemini, ChatGPT and what-have-you, but I’d be surprised if it was that different.)
Here’s a good example of how an LLM doesn’t really understand code in context and thus finds a “bug” that’s literally mitigated in the line before the one where it spots the potential bug: https://daniel.haxx.se/blog/2024/01/02/the-i-in-llm-stands-for-intelligence/ (see “Exhibit B”, which links to: https://hackerone.com/reports/2298307, which is the actual HackerOne report).
LLMs don’t understand code. It’s literally your “helpful”, non-programmer friend—on stereoids—cobbling together bits and pieces from searches on SO, Reddit, DevShed, etc. and hoping the answer will make you impressed with him. Reading the study from TFA (https://dl.acm.org/doi/pdf/10.1145/3613904.3642596, §§5.1-5.2 in particular) only cements this position further for me.
And that’s not even touching upon the other issues (like copyright, licensing, etc.) with LLM-generated code that led to NetBSD simply forbidding it in their commit guidelines: https://mastodon.sdf.org/@netbsd/112446618914747900
Edit: Spelling
I’m very familiar with what LLMs do.
You’re misunderstanding what copilot does. It just completes a line or section of code. It doesn’t answer questions - it just continues a pattern. Sometimes quite intelligently.
Shoot me a message on discord and I’ll do a screenshare for you. #locuester
It has improved my quality and speed significantly. More so than any other feature since intellisense was introduced (which many back then also frowned upon).
Fair enough, and thanks for the offer. I found a demo on YouTube. It does indeed look a lot more reasonable than having an LLM actually write the code.
I’m one of the people that don’t use IntelliSense, so it’s probably not for me, but I can definitely see why people find that particular implementation useful. Thanks for catching and correcting my misunderstanding. :)
Example of (reasonably) good prompt: This code introduces deep recursion and can sometimes cause a “maximum stack size exceeded” error in certain cases. Please help me convert it to use a
while
loop instead.That sounds like those cases on YouTube where the correction to the code was shorter than the prompt hahaha
I’ve found chatgpt reasonably good for one thing: Generating regex-patterns. I don’t know regex for shit, but if I ask for a pattern described with words, I get a working pattern 9/10 times. It’s also a very easy use-case to double check.
I just use it to get ideas about how to do something or ask it to write short functions for stuff i wouldnt know that well. I tried using it to create graphical ui for script but that was constant struggle to keep it on track. It managed to create something that kind of worked but it was like trying to hold 2 magnets of opposing polarity together and I had to constantly reset the conversation after it got “corrupted”.
Its useful tool if you dont rely on it, use it correctly and dont trust it too much.
This has been true for code you pull from posts on stackoverflow since forever. There are some good ideas, but they a. Aren’t exactly what you are trying to solve and b. Some of the ideas are incomplete or just bad and it is up to you to sort the wheat from the chaff.
Yeah I’ve been trying to recreate the same gui tools with every version and it is getting much better but it still struggles. The python specific gpt actually manages to create what I ask for and can make changes once it’s got the base established, I have to correct a few little glitches but nothing too terrible.
For functions like save all the info in text boxes to Json and fill that info back in when load is pressed it never fails at. Making little test scripts for functions or layouts it saves me huge amounts of mental effort.
It’s like image gen, you have to know what to expect to get the most out of it, ask for something it finds difficult it’s easy to confuse it but ask for things it’s good at and it’ll amaze you.
I couldn’t have said it better
I guess it depends on the programming language… With python, I got very fast great results. But python is all about quick and dirty 😂
I asked ChatGPT for assistance with JavaScript doing HL7 stuff and it was a joke… After the seventh correction I gave up on it (at least for that task)
In Rust, it’s not great. It can’t do proper memory management in the language, which is pretty essential.
Well, if you use free chatGPT you only have knowledge until 2022, maybe that’s the reason
People down vote me when I point this out in response to “AI will take our jobs” doomerism.
I mean, AI eventually will take our jobs, and with any luck it’ll be a good thing when that happens. Just because Chat GPT v3 (or w/e) isn’t up to the task doesn’t mean v12 won’t be.
I’m not so sure about the “it’ll be good” part. I’d like to imagine a world where people don’t have to work because everything is done by robots but in reality you’ll have some companies that will make trillions while everyone else will go hungry and become poor and homeless.
Yes, that’s exactly the scenario we need to avoid. Automated gay space communism would be ideal, but social democracy might do in a pinch. A sufficiently well-designed tax system coupled with a robust welfare system should make the transition survivable, but the danger with making that our goal is allowing the private firms enough political power that they can reverse the changes.
Yes, this is also true. I see things like UBI as an inevitable necessity, because AI and automation in general will eliminate the need for most companies to employ humans. Our capitalistic system is set up in a way such that a person can sell their ability to work and provide value to the owner class, but if that dynamic is ever challenged on a fundamental level, it will violently collapse when people who can’t get jobs because a robot replaced them either reject automation to preserve the status quo or embrace a new dynamic that provides for the population’s basic needs without requiring them to be productive.
But the way that managers talk about AI makes it sound like the techbros have convinced everybody that AI is far more powerful than it currently is, which is a glorified chatbot with access to unfiltered Google search results.
It could be good thing, but price for that is making being unemployed okay.
This begs some reflection. what is a"job", functionally? What would be needed for losing it to be good?
I suspect a system with jobs would not eradicate jobs, just change them.
If it’s possible for AI to reach that level. We shouldn’t take for granted it’s possible.
I was really humbled when I learned that a cubic mm of human brain matter took over a petabyte to map. It suggests to me that AI is nowhere close to the level you’re describing.
It suggests to me that AI
This is a fallacy. Specifically, I think you’re committing the informal fallacy confusion of necessary and sufficient conditions. That is to say, we know that if we can reliably simulate a human brain, then we can make an artificial sophont (this is true by mere definition). However, we have no idea what the minimum hardware requirements are for a sufficiently optimized program that runs a sapient mind. Note: I am setting aside what the definition of sapience is, because if you ask 2 different people you’ll get 20 different answers.
We shouldn’t take for granted it’s possible.
I’m pulling from a couple decades of philosophy and conservative estimates of the upper limits of what’s possible as well as some decently-founded plans on how it’s achievable. Suffice it to say, after immersing myself in these discussions for as long as I have I’m pretty thoroughly convinced that AI is not only possible but likely.
The canonical argument goes something like this: if brains are magic, we cannot say if humanlike AI is possible. If brains are not magic, then we know that natural processes can create sapience. Since natural processes can create sapience, it is extraordinarily unlikely that it will prove impossible to create it artificially.
So with our main premise (AI is possible) cogently established, we need to ask the question: “since it’s possible, will it be done, and if not why?” There are a great many advantages to AI, and while there are many risks, the barrier of entry for making progress is shockingly low. We are talking about the potential to create an artificial god with all the wonders and dangers that implies. It’s like a nuclear weapon if you didn’t need to source the uranium; everyone wants to have one, and no one wants their enemy to decide what it gets used for. So everyone has the insensitive to build it (it’s really useful) and everyone has a very powerful disincentive to forbidding the research (there’s no way to stop everyone who wants to, and so the people who’d listen are the people who would make an AI who’ll probably be friendly). So what possible scenario do we have that would mean strong general AI (let alone the simpler things that’d replace everyone’s jobs) never gets developed? The answers range from total societal collapse to extinction, which are all worse than a bad transition to full automation.
So either AI steals everyone’s job or something worse happens.
Thanks for the detailed and thought provoking response. I stand corrected. I appreciate the depth you went into!
You’re welcome! I’m always happy to learn someone re-evaluated their position in light of new information that I provided. 🙂
Even if AI is able to answer all questions 100% accurately, it wouldn’t mean much either way. Most of programming is making adjustments to old code while ensuring nothing breaks. Gonna be a while before AI will be able to do that reliably.
I always thought of it as a tool to write boilerplate faster, so no surprises for me
Rookey numbers! If we can’t get those numbers up to at least 75% by next quarter, then the whippings will occur until misinformation increases!
They’ve done studies: 48% of the time, it works every time.