Interesting what he wrote about LLMs’ inability to “zoom out” and see the whole picture. I use Gemini and ChatGPT sometimes to help debug admin / DevOps problems. It’s a great help for extra input, a bit like rubberducking on steroids.
Examples how it went:
Problem: Apache-cluster and connected KeyCloak-Cluster, odd problems with loginflow. Reducing KeyCloak to 1 node solves it, so it says that we need to debug node communication and how to set the debug log settings. A lot of analysis together. But after a while, it’s pretty obvious that the Apache-cluster doesn’t use the sticky session correctly and forwards requests to the wrong KeyCloak node in the middle of the login flow. LLM does not see that, wanted to continue to dig deeper and deeper into supposedly “odd” details of the communication between KeyCloak nodes, althought the combined logs of all nodes show that the error was in load balancing.
Problem: Apache from a different cluster often returns 413 (payload too large). Indeed it happens with pretty large requests, the limit where it happens is a big over 8kB without the body. But the incoming request is valid. So I ask both Gemini and ChatGPT for a complete list of things that cause Apache to do that. It does a decent job at that. And one of it is close: It says to check for mod_proxy_ajp use, since that observed limit could be caused by trying to make an AJP package to communicate with backchannel servers. It was not the cause; the actual mod was mod_jk, which also uses AJP. It helped me focus on watching out for anything using AJP when reviewing the whole config manually, so I found it, and the “rubberducking” helped indirectly. But the LLM said we must forget about AJP and focus on other possible causes - a dead end. When I told it the solution, it was like: Of course mod_jk. (413 sounds like the request TO the apache is wrong, but actually, it tries internally to create an invalid AJP package over 8kB, and when it fails blames the incoming request.)
LLMs are useful to provide generic examples of how a function works. This is something that would previously take an hour of searching the docs and online forums, but the LLM can do for very quickly, and I appreciate. But I have a library I want to use that was just updated with entirely new syntax. The LLMs are pretty much useless for it. Back to the docs I go! Maybe my terrible code will help to train the model. And in my field (marine biogeochemistry), the LLM generally cannot understand the nuances of what I’m trying to do. Vibe coding is impossible. And I doubt the training set will ever be large or relevant enough for the vibe coding to be feasible.
Vibe coding
The term for that is actually ‘slopping’. Kthx ;-)
Thats simply not true. LLMs with RAG can easily catch up with new library changes.
Subjectively speaking, I don’t see it so that good a job of being current or priortizing current over older.
While RAG is the way to give LLM a shot at staying current, I just didn’t see it doing that good a job with library documentation. Maybe it can do all right with tweaks like additional properties or arguments, but more structural changes to libraries I just don’t see being handled.
Exactly. It’s an very niche library (tmap for R) and just was completely overhauled. Gemini, chatGPT and Copilot all seem pretty confused and mix up the old and new syntax
Thats a lot on implementation of the LLM engine . For python or js you can feed the API schema of the entire virtual environment.
You can’t know without checking though, it may be wrong
Note: this comes from someone that makes a (very good) ide which they only monetize with an AI subscription so it’s interesting to see their take
(They use Claude opus like all the others so the results are similar)
in one regard I can understand, they’re running a business and don’t want to be at a disadvantage against their competition.
on the other hand have some conviction for your product, otherwise I will lose confidence that your product is as good as your marketing makes it seem.
They are still bullish on LLM, just to augment rather than displace human suggested development.
This perspective is quite consistent with the need for a product that manages prompting/context for a human user and helps the human review and integrate the LLM supplied content in a reasonable way.
If LLM were as useful as some of the fanatics say, you’d just use a generic prompt and it would poop out the finished project. This is by the way the perspective of an executive I talked to not long ago, that he was going to be able to let go of all his “coders” and feed his “insight” directly into a prompt that will do it all for him instead. He is also easily influenced so articles like this can reshape him into a more tenable position, after which he’ll pretend he never thought a generic prompt would be good enough
LLMs have made it really clear when previous concepts actually grouped things that were distinct. Not so long ago, Chess was thought to be uniquely human, until it wasn’t, and language was thought to imply intelligence behind it, until it wasn’t.
So let’s separate out some concerns and ask what exactly we mean by engineering. To me, engineering means solving a problem. For someone, for myself, for theory, whatever. Why do we want to solve the problem, what we want to do to solve the problem, and how we do that often blurred together. Now, AI can supply the how in abundance. Too much abundance, even. So humans should move up the stack, focus on what problem to solve and why we want to solve it. Then, go into detail to describe what that solution looks like. So for example, making a UI in Figma or writing a few sentences on how a user would actually do the thing. Then, hand that off to the AI once you think it’s sufficiently defined.
The author misses a step in the engineering loop that’s important though. Plans almost always involve hidden assumptions and undefined or underdefined behavior that implementation will uncover. Even more so with AI, you can’t just throw a plan and expect good results, the humans need to come back, figure out what was underdefined or not actually what they wanted, and update the plan. People can ‘imagine’ rotating an apple in their head, but most of them will fail utterly if asked to draw it; they’re holding the idea of rotating an apple, not actually rotating the apple, and application forces realization of the difference.
The author misses a step in the engineering loop that’s important though. Plans almost always involve hidden assumptions and undefined or underdefined behavior that implementation will uncover.
His whole point is two mental models and a model delta. Exactly what you just described.
I’ve done a test of 8 LLMs, on coding. It was using the J language, asking all of them to generate a chess “mate in x solver”
Even the bad models were good at organizing code, and had some understanding of chess, were good at understanding the ideas in their prompts. The bad models were bad mostly on logic. Not understanding indexing/amend on a table, not understanding proper function calling, or proper decomposition of arguments in J. Bad models included copilot and openAI’s 120g open source model. kimi k2 was ok. Sonet 4 the best. I’ve mostly used Qwen 3 245 for better free accessibility than Sonet 4, and the fact that it has a giant context that makes it think harder (slower) and better the more its used on a problem. Qwen 3 did a good job in writing a fairly lengthy chess position scoring function, and then separating it into 2 quick and medium function, incorporating self written library code, and recommending enhancements.
There is a lot to get used to in working with LLMs, but the right ones, can generally help with code writting process. ie. there exists some code outputs which even when wrong, provide a faster path to objectives than if that code output did not exist. No matter how bad the code outputs, you are almost never dumber for having received it, unless perhaps you don’t understand the language well enough to know its bad.
The LLM worship has to stop.
It’s like saying a hammer can build a house. No, it can’t.
It’s useful to pound in nails and automate a lot of repetitive and boring tasks but it’s not going to build the house for you - architect it, plan it, validate it.
It’s similar to the whole 3D printing hype. You can 3D print a house! No you can’t.
You can 3D print a wall, maybe a window.
Then have a skilled Craftsman put it all together for you, ensure fit and finish and essentially build the final product.
You’re making a great analogy with the 3D printing of a house.
However, if we consider the 3D printed house scenario; that skilled craftsman is now able to do things on his own that he would have needed a team for in the past. Most, if not all, of the less skilled members of that team are not getting any experience within the craft at that point. They’re no longer necessary when one skilled person can now do things on their own.
What happens when the skilled and highly experienced craftsmen that use AI as a supplemental tool (and subsequently earn all the work) eventually retire, and there’s been no juniors or mid-levels for a while? No one is really going to be qualified without having had exposure to the trade for several years.
Absolutely. This is a huge problem and I’ve read about this very problem from a number of sources. This will have a huge impact on engineering and information work.
Interestingly enough, A similar shortage occurred in the trades when information work was up and coming and the trades were shunned as a career path for many. Now we don’t have enough plumbers and electricians. Trades are now finding their the skills in high demand and charging very high rates.
The trades problem is a typical small business problem with toxic work environments. I knew plenty that washed out of the trades because of that. The “nobody wants to work anymore” tradesmen but really it’s “nobody wants to work with me for what I’m willing to pay”
I don’t doubt that that’s a problem either in some of those small businesses.
I have a great electrician that I call all the time. He’s probably in his late 60s. It’s definitely more of a rough and tumble work environment than IT work, for sure, but he’s a good guy and he pays his people well and he charges me an arm and a leg.
But we talk about it and he tells me about how the same work he would have charged a quarter the price just 10 years ago. And honestly, he’s one of the more affordable ones.
So it definitely seems like the trades is the place to be these days with so few good ones around. But yeah you have to pick and choose who’s mentoring you.
I hate the simulated intelligence nonsense at least as much as you, but you should probably know about this if you’re saying you can’t 3d print a house: https://youtu.be/vL2KoMNzGTo
Yeah I’ve seen that before and it’s basically what I’m talking about. Again, that’s not “printing a 3D house” as hype would lead one to believe. Is it extruding cement to build the walls around very carefully placed framing and heavily managed and coordinated by people and finished with plumbing, electrical, etc.
It’s cool that they can bring this huge piece of equipment to extrude cement to form some kind of wall. It’s a neat proof of concept. I personally wouldn’t want to live in a house that looked anything like or was constructed that way. Would you?
I mean, “to 3d print a wall” is a massive, bordering on disingenuous, understatement of what’s happening there. They’re replacing all of the construction work of framing and finishing all of the walls of the house, interior and exterior, plus attaching them and insulating them, with a single step.
My point is if you want to make a good argument against LLMs, your metaphor should not have such an easy argument against it at the ready.
Did you see another video about this? The one linked only showed the walls and still showed them doing interior framing. Nothing about windows, electrical, plumbing, insulation, etc.
What they showed could speed up construction but there are tons of other steps involved.
I do wonder how sturdy it is since it doesn’t look like rebar or anything else is added.
I’m not an expert on it, I’ve only watched a few videos on it, but from what I’ve seen they add structural elements between the layers at certain points which act like rebar.
There’s no framing of the walls, but they do set up scaffolds to support overhangs (because you can’t print onto nothing)
I’m with you on this. We can’t just causally brush aside a machine that can create the frame of a house unattended - just because it can’t also do wiring. It was a bad choice of image to use to attack AI. In fact it’s a perfect metaphor for what AI is actually good for: automating certain parts of the work. Yes you still need an electrician to come in, just like you also need a software engineer to wire up the UI code their LLM generated to the back end, etc.
You circled all the way back to the original point lol. The whole thrust of this conversation is “AI can be used to automate parts of the work, but you still need knowledgeable people to finish it”. Just like “a concrete 3d printer can be used to automate parts of building a house, but you still need knowledgeable people to finish it.”
Spoken like a person who has never been involved in the construction of a home. It’s effectively doing the job of (poorly) pouring concrete which isn’t the difficult or time consuming part.
My dude, I worked home renovations for many years. Nice try to discredit me rather than my argument though.
Ah, my apologies. I had interpreted your message to suggest that pouring cement from a robotic arm fully replaced all of the construction work of framing and finishing all of the walls of the house, interior and exterior, plus attaching them and insulating them, with a single step.
it’s basically what I’m talking about
Well, a minute ago you were saying that AI worship is akin to saying
a hammer can build a house
Now you’re saying that a hammer is basically the same thing as a machine that can create a building frame unattended? Come on. You have a point to be made here but you’re leaning on the stick a bit too hard.
3d printed concrete houses exist. Why can’t you 3d print a house? Not the best metaphor lol
You can certainly 3D print a building, but can you really 3D print a house? Can it 3d print doors and windows that can open and close and be locked? Can it 3D print the plumbing and wiring and have it be safe and functional? Can it 3D print the foundation? What about bathroom fixtures, kitchen cabinets, and things like carpet?
It’s actually not a bad metaphor. You can use a 3D printer to help with building a house, and to 3D print some of fixtures and bits and pieces that go into the house. Using a 3D printer would automate a fair amount of the manual labor that goes into building a house today (at least how it is done in the US). But you’re still going to need people who know what they are doing put it all together to transform the building to a functional home. We’re still a fair ways away from just being able to 3D print a house, just like we’re fair ways away from having a LLM write a large, complex piece of software.
Exactly this.
You don’t like glass windows? Air conditioning? A door?
No they aren’t. With enough setup and very unique and expensive equipment, you can pour shitty concrete walls that will be way more expensive and worse than if you did it normally. That will give you 20% of the house, at best. 20% of not very good of a house.
I think it’s going to require a change in how models are built and optimized. Software engineering requires models that can do more than just generate code.
You mean to tell me that language models aren’t intelligent? But that would mean all these people cramming LLMs in places where intelligence is needed are wasting their time?? Who knew?
Me.
I have a solution for that, I just need a small loan of a billion dollars and 5 years. #trustmebro
Only one billion?? What a deal! Where’s my checkbook!?
Well, they will simply fire many and leave the required number of workers to work with AI. This is exactly what they will want to do at any convenient opportunity. But those who remain will still have to check everything carefully in case the AI made a mistake somewhere.
Clearly LLMs are useful to software engineers.
Citation needed. I don’t use one. If my coworkers do, they’re very quiet about it. More than half the posts I see promoting them, even as “just a tool,” are from people with obvious conflicts of interest. What’s “clear” to me is that the Overton window has been dragged kicking and screaming to the extreme end of the scale by five years of constant press releases masquerading as news and billions of dollars of market speculation.
I’m not going to delegate the easiest part of my job to something that’s undeniably worse at it. I’m not going to pass up opportunities to understand a system better in hopes of getting 30-minute tasks done in 10. And I’m definitely not going to pay for the privilege.
I’m not a “software engineer” but a lot of people that don’t work within tech would probably call me one.
I’m in Cloud Engineering, but came from the sys/network admin and ops side of things rather than starting off in dev or anything like that.
Up until about 5 years ago, I really only knew Powershell and a little bit of bash. I’ve gotten up to speed in a lot of things but never officially learned python, js, go or any other real development language that would be useful to me. I’ve spent way more time focusing on getting good with IaC, and probably more of the SRE type stuff.
In my particular situation, LLMs are incredibly useful. It’s fair to say that I use them daily now. I’ve had it convert bash scripts to python for me very quickly. I don’t know python but now that I’m able to look at a python script next to my bash; I’m picking up on stuff a lot faster. I’m using Lambda way more often as a result.
Also, there’s a lot of mundane filling out forms shit that I delegate to an LLM. I don’t want to spend my time filling out a form that I know no one is actually going to read. F it, I’ll have the AI write a report for an AI. It’s dumb as shit, but that’s the world today.
I’ve found them useful, sometimes, but nothing like a fraction of what the hype would suggest.
They’re not adequate replacements for code reviewers, but getting an AI code review does let me occasionally fix a couple of blunders before I waste another human’s time with them.
I’ve also had the occasional bit of luck with “why am I getting this error” questions, where it saved me 10 minutes of digging through the code myself.
“Create some test data and a smoke test for this feature” is another good timesaver for what would normally be very tedious drudge work.
What I have given up on is “implement a feature that does X” questions, because it invariably creates more work than it saves. Companies selling “type in your app idea and it’ll write the code” solutions are snake-oil salesman.
I’ve only found two effective uses for them. Every time I tried them otherwise they fell flat and took me longer that it would have to write the code myself.
The first was a greenfield personal project where I let code quality wane since I was the only person maintaining it, and wanted to test LLMs. The other was to write highly repeative data tests where the model can simply type faster than me.
Anything that requires writing code that needs to be maintained by multiple people or systems older than 2 years, it has fallen completely flat. In cases like that I spend more time telling the LLM it is doing it wrong, it would have taken me less time to write the code in the first place. In 95% of cases, I am still faster than an LLM at solving a problem and writing the code.
I don’t use one, and my coworkers that do use them are very loud about it, and worse at their jobs than they were a year ago.
I have been using it a bit, still can’t decide if it is useful or not though… It can occasionally suggest a blatantly obvious couple of lines of code here and there, but along the way I get inundated with annoying suggestions that are useless and I haven’t gotten used to ignoring them.
I mostly work with a niche area the LLMs seem broadly clueless about, and prompt driven code is almost always useless except when dealing with a super boilerplate usage of a common library.
I do know some people that deal with amazingly mundane and common functions and they are amazed that it can pretty much do their jobs, but they never really impressed me before anyway and I wondered how they had a job…
https://survey.stackoverflow.co/2025/ai/
47% daily use
47% daily use
That is NOT what that says. It says 47% of STACK OVERFLOW RESPONDENTS REPORT using AI. That does not represent 47% of devs.
If you go to 4chan and poll of chuds, you’re going to get a high percentage of respondents affirming your query. You went to stackoverflow and asked about AI. Think about the user base.
thanks but i felt like that’d be obvious from the URL lol. the SO survey is probably the largest sample size we have for this…
…that isn’t outright from an AI company (not that SO doesn’t have AI but they’re still an answers company as opposed to, say, Cursor AI whose main selling point is the AI. even Zed, the company behind the blog linked in the post, has a much higher emphasis on AI) and their sample should be pretty close to all online devs, maybe slightly exclusionary of very experienced ones. SO’s evangelist proportion is not even close to 4chan’s chud proportion; not sure why had the impression needed to name that comparison.
it’s not like Codidact has a dev survey and even if they had one they’d have as much bias as this comment section
I don’t work in IT, but I do know you need creativity to work in the industry, something which the current LLM/AI doesn’t possess.
Linguists also dismiss LLMs in similar vein because LLMs can’t grasp context. It is always funny to be sarcastic and ironic on an LLM.
Soft skills and culture are what that the current iteration of LLMs lack. However, I do think there is still huge potential for AI development in dacades to come, but I want this AI bubble to burst as “in your face” to companies.
Good article, I couldn’t agree with it more, it’s exactly my experience.
The tech is being developed really fast and that is the main issue when taking about ai. Most ai haters are using the issues we might have today to discredit the while technology which makes no sense to me.
And this issue the article talks about is apparent and whoever solves it will be rich.
However, it’s interesting to think about the issues that come next.
Like the guy whose baby doubled in weight in 3 months and thus he extrapolated that by the age of 10 the child would weigh many tons, you’re assuming that this technology has a linear rate of improvement of “intelligence”.
This is not at all what’s happening - the evolution of things like LLMs in the last year or so (say between GPT4 and GPT5) is far less than it was earlier in that Tech and we keep seeing more and more news on problems about training it further and getting it improved, including the big one which is that training LLMs on the output of LLMs makes them worse, and the more the output of LLMs is out there, the harder it gets to train new iteractions with clean data.
(And, interestingly, no Tech has ever had a rate of improvement that didn’t eventually tailed of, so it’s a peculiar expectation to have for a specific Tech that it will keep on steadily improving)
With this specific path taken in implementing AI, the question is not “when will it get there” but rather “can it get there or is it a technological dead-end”, and at least for things like LLMs the answer increasingly seems to be that it is a technological dead-end for the purpose of creating reasoning intelligence and doing work that requires it.
(For all your preemptive defense by implying that critics are “ai haters”, no hate is required to do this analysis, just analytical ability and skepticism, untainted by fanboyism)
The difference here is that the current ai tech advancements are not just a consequence of one single tech, but of many.
Everything you wrote you believe, depends on this being one tech, one dead end.
The real situation is that we finally have the hardware and the software to make breakthroughs. There is no dead end to this. It’s just a series of steps, each contributing by itself and by learning from its mass implementations. It’s like we got he first taste of ai and we can’t get enough. Even if it takes a while to the next advancement.
That doesn’t even make sense - it’s not merely the there being multiple elements which add up to a specific tech that makes it capable of reaching a specific goal, just like throwing multiple ingredients into a pot doesn’t guarantee you a tasty dish as output and you have absolutely no proof that “we finally have the hardware and the software to make breakthroughs” hence you can’t anchor the forecast that the stuff done on top of said hardware and software will achieve a great outcome entirely anchored on your assertion that “it’s made up from stuff which can do greatness”.
As for the tech being a composition of multiple tech elements, that doesn’t mean much: most dishes too are a composition of multiple elements and that doesn’t mean that any random combination of stuff thrown into a pot will make a good dish.
That idea that more inputs make a specific output more likely is like claiming that “the chances of finding a needle increase with the size of the haystack” - the very opposite of reality.
Might want to stop using LLMs to write your responses and engage your brain instead.
Ah, there go the insults. Surely the best way to display the superiority of your argument lol. And show who is the rational one in any conversation. But I’ll let the first one side, ok. Anyone can have a weak moment. For sure I had many.
My post has sense. You can claim ,as you have, that multiple ingredients don’t guarantee a tasty dish and fair enough, but in the other hand the opposite is also obviously not true. So I claim that’s not an argument against what I said by logic itself. I can also say that’s not a good comparison. We have a technology that is already giving us results. You can claim they aren’t good, but considering how many people use it already, that by itself could refute that claim, without mentioning any case studies which are plenty.
To the meat of the thing. Maybe I can’t claim that we are headed for an ai nirvana, but the same you can’t say LLMs are in any kind of dead end, especially not one that will mean ai stagnation for the medium future. But I can safely claim we are far closer than we were 3 years ago, by many orders of magnitude. The reasons being exactly hardware and LLMs. And this is exactly the reason for investments in the very the same tech, infrastructure, companies, institutions, universities, (…), that would invent new technology in AI. So, in the worst case scenario for the llms, they have accelerated the investments and improved the infrastructure for future inventions. Worst case.
Got more insults?
Sure mate, your logic is flawless and you’re not at all pretty much just using falacies and axiomatic statements to make the case that “this is going to be the greatest thing ever (invest now!)” like all the other types selling their book on some tech hype as has become common since the 90s and anybody pointing this out is really just insulting you by not accepting your clear genius.
Life must be hard for the benevolent AI Investor just trying to share with others how the tech domain they’re invested in is CERTAIN to become the greatest thing ever because it’s made on top of elements which are CERTAIN to be the elements that will one day deliver the greatest thing ever, only to get insulted by people daring to point out that all that certainty isn’t backed by anything but “trust me”.
Youre insults are so great that I know now you have bested me. Incredible debate strategy to ignore all the arguments and go straight for the jugular with personal attacks. Truly remarkable rhetorical capabilities! I salute you! Have a nice day. Bye.
It’s true, the tech will get better in the future, we just need to believe and trust the plan.
Same thing with crypto and NFT’s. They were 99% scam by volume, but who wouldn’t love moving their life savings into a digital ecosystem controlled by a handful of rich gambling addicts with no consumer protections? Imagine, you’ll never need to handle dirty paper money ever again, we’ll just put it all in a digital wallet somewhere controlled by someone else coughmastercardcough.
And another thing, we were too harsh on the Metaverse. Sure, spending 8 hours in VR could make you vomit, and the avatars made ET for the Atari look like Uncharted 4, but it was just in its infancy!
I too want to outsource all my critical thinking to a chatbot controlled by an wealthy insular narcissist who throws Nazi salutes. The technology just needs time to mature. Who knows, maybe it can automate the exile of birthright citizens for us too!
/s
That’s exactly the hyperbole I was talking about. Your post is full of obvious fallacies, but the fact that you are pushing everything to the absolutes is the silliest one.
Your whole point is discounting the experience of 50 years in technological evolution (that all technological branches invariably slow down and stop improving) and the last 20 years of hype in Tech (literally everything is pushed like crazy as “the next big thing” by people trying to make a lot of money from it, and almost all of it isn’t), so that specific satirical take on your post is well deserved.
Satirical? That didn’t fit the description of the word at all. Your should check a dictionary.
All technological branches invariably slow down? Ever heard of Moore’s law? I’m just gonna stop here and not talk to you again. It’s clear, you just want a conflict and I don’t think you have much else to offer. Bye.
The idea of the mental model CAN be done by AI.
In my experience, if you get it to build a requirements doc first, then ask it to implement that while updating it as required (effectively it’s mental state). you will get a pretty good output with decent ‘debugging’ ability.
This even works ok with the older ‘dumber’ models.
That only works when you have a comprehensive set of requirements available though. It works when you want to add a new screen/process (mostly) but good luck updating an existing one! (I haven’t tried getting it to convert existing code to a requirements doc - anyone tried that?)
I tried feeding ChatGPT a Terraform codebase once and asked it to produce an architecture diagram of what the code base would deploy to AWS.
It got most of the little blocks right for the services that would get touched. But the layout and traffic direction flow between services was nonsensical.
Truth be told it did do a better job than I thought it would initially.
The trick is to split up the tasks into chunks.
Ask it to identify the blocks.
Then ask it to identify the connections.
Then ask it to produce the diagram.
Which means you just did four things to help the AI which the AI can’t do itself. That makes it a tool: useful in some applications, not useful in others, and constantly requiring a human to properly utilize it.