Excerpt:
“Even within the coding, it’s not working well,” said Smiley. “I’ll give you an example. Code can look right and pass the unit tests and still be wrong. The way you measure that is typically in benchmark tests. So a lot of these companies haven’t engaged in a proper feedback loop to see what the impact of AI coding is on the outcomes they care about. Lines of code, number of [pull requests], these are liabilities. These are not measures of engineering excellence.”
Measures of engineering excellence, said Smiley, include metrics like deployment frequency, lead time to production, change failure rate, mean time to restore, and incident severity. And we need a new set of metrics, he insists, to measure how AI affects engineering performance.
“We don’t know what those are yet,” he said.
One metric that might be helpful, he said, is measuring tokens burned to get to an approved pull request – a formally accepted change in software. That’s the kind of thing that needs to be assessed to determine whether AI helps an organization’s engineering practice.
To underscore the consequences of not having that kind of data, Smiley pointed to a recent attempt to rewrite SQLite in Rust using AI.
“It passed all the unit tests, the shape of the code looks right,” he said. It’s 3.7x more lines of code that performs 2,000 times worse than the actual SQLite. Two thousand times worse for a database is a non-viable product. It’s a dumpster fire. Throw it away. All that money you spent on it is worthless."
All the optimism about using AI for coding, Smiley argues, comes from measuring the wrong things.
“Coding works if you measure lines of code and pull requests,” he said. “Coding does not work if you measure quality and team performance. There’s no evidence to suggest that that’s moving in a positive direction.”


I think it is talking 100% vibe code. And yea it’s pretty useful if you don’t abuse it
Yeah, it’s really good at short bursts of complicated things. Give me a curl statement to post this file as a snippet into slack. Give me a connector bot from Ollama to and from Meshtastic, it’ll give you serviceable, but not perfect code.
When you get to bigger, more complicated things, it needs a lot of instruction, guard rails and architecture. You’re not going to just “Give me SQLite but in Rust, GO” and have a good time.
I’ve seen some people architect some crazy shit. You do this big long drawn out project, tell it to use a small control orchestrator, set up many agents and have each agent do part of the work, have it create full unit tests, be demanding about best practice, post security checks, oroborus it and let it go.
But it’s expensive, and we’re still getting venture capital tokens for less than cost, and you’ll still have hard-to-find edge cases. Someone may eventually work out a fairly generic way to set it up to do medium scale projects cleanly, but it’s not now and there are definite limits to what it can handle. And as always, you’ll never be able to trust that it’s making a safe app.
Yea I find that I need to instruct it comparably to a junior to do any good work…And our junior standard - trust me - is very very low.
I usually spam the planning mode and check every nook of the plan to make sure it’s right before the AI even touches the code.
I still can’t tell if it’s faster or not compared to just doing things myself…And as long as we aren’t allocated time to compare end to end with 2 separate devs of similar skill there’s no point even trying to guess imho. Though I’m not optimistic. I may just be wasting time.
And yea, the true costs per token are probably double than what they are today, if not more…
Once you set up a proper scaffold for it in one project, it’s marginally repeatable across other projects. If all you have is one project, that would be crap. Where this will disrupt and kill things is in cheap contract work.
If you’re trying to produce grade A code parallel to a grade A developer on a single project, it’s absolutely a losing battle for AI
But if you have unit tests and say go upgrade these libraries, test, then fix any problems and keep that loop until it all works. It’s about at the point where it can be servicable.
I’m betting tokens for development go up 50-100 times when it’s all done. I know that sounds shocking, but hear me out.
The AI companies’ bet is that they can get companies to fire enough developers to convert a decent percentage of salaries over to AI. They’re planning on Bob’s Discount Coders to fire people making 40-80k and long-term move 40k of that salary per head to them.
It’ll be like a streaming service where they’re paying $16 a month for claude and they’ll slowly enshittify the service until it’s a grand or two per month per head.
Step 1: Depress the market for coders at a loss, allowing companies to pay less and hoover up the extra money by firing people, which means less computer science in college, making a hole in the job market.
Step 2: Slowly crank up the features and cost until the prices are back where they were, but all the money is flowing to them.
I think you hit the bullseye with this.