Excerpt:

“Even within the coding, it’s not working well,” said Smiley. “I’ll give you an example. Code can look right and pass the unit tests and still be wrong. The way you measure that is typically in benchmark tests. So a lot of these companies haven’t engaged in a proper feedback loop to see what the impact of AI coding is on the outcomes they care about. Lines of code, number of [pull requests], these are liabilities. These are not measures of engineering excellence.”

Measures of engineering excellence, said Smiley, include metrics like deployment frequency, lead time to production, change failure rate, mean time to restore, and incident severity. And we need a new set of metrics, he insists, to measure how AI affects engineering performance.

“We don’t know what those are yet,” he said.

One metric that might be helpful, he said, is measuring tokens burned to get to an approved pull request – a formally accepted change in software. That’s the kind of thing that needs to be assessed to determine whether AI helps an organization’s engineering practice.

To underscore the consequences of not having that kind of data, Smiley pointed to a recent attempt to rewrite SQLite in Rust using AI.

“It passed all the unit tests, the shape of the code looks right,” he said. It’s 3.7x more lines of code that performs 2,000 times worse than the actual SQLite. Two thousand times worse for a database is a non-viable product. It’s a dumpster fire. Throw it away. All that money you spent on it is worthless."

All the optimism about using AI for coding, Smiley argues, comes from measuring the wrong things.

“Coding works if you measure lines of code and pull requests,” he said. “Coding does not work if you measure quality and team performance. There’s no evidence to suggest that that’s moving in a positive direction.”

  • BrightCandle@lemmy.world
    link
    fedilink
    arrow-up
    14
    ·
    21 hours ago

    I keep trying to use the various LLMs that people recommend for coding for various tasks and it doesn’t just get things wrong. I have been doing quite a bit of embedded work recently and some of the designs it comes up with would cause electrical fires, its that bad. Where the earlier versions would be like “oh yes that is wrong let me correct it…” then often get it wrong again the new ones will confidently tell you that you are wrong. When you tell them it set on fire they just don’t change.

    I don’t get it I feel like all these people claiming success with them are just not very discerning about the quality of the code it produces or worse just don’t know any better.

    • Fedizen@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      8 hours ago

      Lowkey I think anyone saying LLMs are useful for work is telling everyone around them their job is producing mostly low quality work and could reasonably be cut.

    • Shayeta@feddit.org
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      1
      ·
      15 hours ago

      It is possible to get good results, the problem is that you yourself need to have an very good understanding of the problem and how to solve it, and then accurately convey that to the AI.

      Granted, I don’t work on embedded and I’d imagine there’s less code available for AI to train on than other fields.

      • ironhydroxide@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        6
        ·
        11 hours ago

        Yes, I definitely want to train a new hire who is superlatively confident that they are correct, while also having to do my job correctly as well, while said new hire keeps putting shit in my work.