What Do I Think I Think About LLMs

definitely

                May 28, 2025

            What Do I Think I Think About LLMs

            Every time I think the AI frenzy has peaked, it peaks again. Writing about coding these days feels like Jimmy Stewart dancing on the edge of a floor that's rapidly receding under him. 
I had a draft of this that started with five or six capsule stories of interactions with LLMs for coding purposes, some saving incremental time, some being wrong, some even being right. 
Then I realized that I probably shouldn't be that detailed about work stuff, but more importantly, you likely have all these stories too. You've seen useful autocorrect, and you've seen the LLM be confidently wrong and you've seen them be confidently right. 
My point is, there's a lot of ways to feel about LLMs as coding tools. I've been holding on to this post for three weeks and I've changed my mind at least twice. 
In two weeks, I'll probably cringe at this. 
Here's a few things I think I think...

Reading SF did not prepare me for this 

I've been reading science fiction about AI basically since I could read, and I've never seen a treatment of AI that captures how weird and scammy parts of this AI world seem, even as parts of it seem to be delivering value.  

I'm getting plagiarized

Part of what makes me really ambivalent about LLM tools is that I know for a dead-certain fact that my 20 year career in technical writing is a (admittedly small) part of the training set for these tools. I don't like that.

Both extremes of the argument about LLMs seem wrong in my experience

Team "Let's push AI" wants to convince me that the existing tools are good enough to be an order of magnitude improvement on my productivity. That's just flat out not true in my experience, though I know some people have reported this and, I guess, good for them. Team "burn it all down" wants to convince me that these tools have no value at all. That's also not true in my experience. There's a lot of things LLMs aren't very good at, but I've gotten code review advice, and sometimes good advice on bugs. I've had it generate tests, and worked with it to create code -- my style is weird enough that it takes some back and forth. The best of the "fancy autocomplete" has gotten good enough to be an incremental improvement in my day. I miss fancy autocomplete when I don't have it. I don't yet miss LLM chat when I don't have it.

This is already much further along than I expected to see in my lifetime

I always thought that computer generated code was coming, but I thought I'd be safely retired. The image generation is well beyond what I would have expected to see in my lifetime. 

I'm very dubious of self-reports of time saved

In aggregate, are the LLMs saving me time? Incrementally, probably. Autocomplete has gotten good enough to pretty clearly be an incremental time savings. Code generation feels like an incremental improvement sometimes. 
And yeah, I've seen the "I vibe-coded this and now I have paying customers" people, and I dunno. I do not yet have the confidence to run pure LLM code through anything that needed security or was touching money.
What is true of my LLM coding adventures that the nature of the coding task changed. There was much less of my typing in code to generate data, the tool did most of that. There was much more pleading with the thing not to change unrelated code. 
People are notoriously bad at evaluating subjective time across different kinds of thought processes. Did it seem faster because there was less busy work? Did it feel longer because I was basically arguing with the AI The whole time? I'm genuinely not sure, and the counter-factual of not using LLMs is hard to estimate.
Way back when I was in ed-tech, we used to say that speeding up a user task 10x fundamentally changed the task. The 90s version of this was, say, Apple's Graphing calculator, which allowed you to "browse" graphs in a way that wasn't feasible before. 
So if I could do a 10 hour coding task in 1 hour (which, to be clear, is nowhere near what I'm seeing), then I've really fundamentally changed the task -- to expand the analogy, I could easily browse different coding structures to try them out before I picked one. My sense is that this is potentially true for prototyping, which does seem to be allowing people to explore a design space more completely. 
I will say this -- two years ago I asked an LLM to code up the NFL tiebreaker process and it flopped -- for example, it couldn't reliably keep track of how to count ties. I asked a couple of weeks ago, and I got a full design, with a question of how to roll out the implementation step by step. I haven't taken it to the final steps, so I'm not sure how hard the last details will be to nail down. But the skeleton structure came in very fast and very detailed. 

You can't tech your way out of non technical problems

We already have automation 

I'll tie these together, because I think they are both limitations of what the tools can do. An LLM is not going to fix communication problems at your organization, and if your developers don't want to learn a thing, an LLM is not magically going to make them learn it. The LLM may make them able to fake it for a while, this doesn't seem like a great idea to me...
Meantime, I keep seeing people proposing LLMs tackle normal automation tasks, which I don't quite get. Like, I feel that tying my ticket system to GitHub is a problem I don't need an LLM to solve. Or do I? This is the place where I really do feel like there's something that other people are seeing that I'm not seeing. There's a whole world of agents that I sort of dimly see what the fuss is about but have not experienced value yet. 
The pushback I get on this is that the LLMs are faster than writing bash scripts, which brings me back to the self-reporting issue and the question of are they, really? The examples I've seen that work seem to have taken enough back and forth with the LLM that I'm genuinely not sure.

LLMs make non-coding tasks feel like coding 

After hearing people talk about using LLMs to generate PR summaries or internal communication, I had the thought that one reason why developers like LLMs for this is that it turns non-coding communication tasks into something that feels like coding. 
Whether this is a good thing or not, I'm not sure. 

Conversely LLMs can make coding tasks feel like non-coding

Obviously one of the other selling points is that powerful coding tasks are now potentially available to non-coders. Vibe-coding is just "coding that doesn't feel like coding". 
And like, I don't have a problem with tools solving problems for people. I follow a guy who uses LLMs to write Python scripts that help him manage work spreadsheets. Seems fine to me. 
That said, on some of the back and forth I've had with LLMs, I can feel my brain shift out of coding mode watching the LLM go back and forth. 
Honestly, it didn't feel great. 
And I did have the case of switching back into coder brain and looking at the test and realizing, "that's way more complicated than it needs to be".

The inefficiency is load bearing and you can't make a spark without friction

I pulled the first phrase out of a Bluesky post on media and I think it explains a lot of what's going on in a lot of directions.  I keep hearing that LLMs will remove friction, and I think, things without friction are really... slippery, which is a metaphor and maybe not a metaphor?
Sometimes the journey is the destination, and sometimes the act of summarizing or explaining your code is valuable for learning. Things that feel like time sinks can be valuable.
You can take this too far -- sometimes grunt tasks are just grunt tasks. But I think it's worth trying to be careful about what you are trying to accomplish with a tool both in the immediate term and the longer term.

Caring Matters

I'm working on a rule of thumb here, which is that the usefulness of LLMs in generating code is inversely proportional to how much I care about the code.
One-off script in a language I don't know very well that is not going to be reused -- don't really care, LLM can probably help a lot.
Throw-away prototype? I literally don't care about the code itself. LLM probably helpful. 
End-to-end test of a system that I already have unit tested? I care that it does what it says, but the style is pretty constrained. LLM could be helpful.
Writing code in a well-constrained piece of business logic? Depending on the constraints, I'm probably arguing about style and long-term issues, so I'm thinking helpful to start and then less so.
Business critical function under web-scale load? I'm going to want to look over that carefully.
Another way of looking at it is that as the code becomes more critical, the typing part that is what the LLM most clearly speeds up becomes less and less of the overall task. (You can get LLMs to help with design, but that takes more back and forth, so it's not as much of a time saver.)

I don't know what to think about LLM generated code quality

Anecdotally, people around me seem to feel that the LLMs generate worse Ruby code than other languages. I'm a little dubious (of the part where the code quality is better in other languages). 
I built out (most of) the NFL tiebreaker program, and the code was... serviceable. Like, it probably works, but the factoring of it is a little odd. In general, it has the feel of a very brute-force solution, a little overly complex, but one or two things I might not have thought of. (I've found "overly complex" to be a common problem with LLM code.)

I'm pretty sure the LLM shouldn't generate both the code and the tests

At least not for code you care about. 
This is maybe a subset of "don't commit code you don't understand" or the IBM "A computer can not be held responsible" meme. 

Try to frame the ways this expands your reach 

Where I've been successful with LLM's it's been using them to expand what I can do -- "what are the design options here", "how can I observe the causes of this error", "How can I tell what this error message might mean". When I've asked it for things that contract what I do "generate a test for this code", "write this function", I've had mixed success, I usually get the code, but there are often problems problems. 
I had an interesting experience with "find examples in this code base where I could use Ruby pattern matching", where it did find legit examples, but the explanations were... weird?

Sometimes this sounds like previous generations language debates

And coding has a rich history of The Olds (which I clearly am by now) saying that The Youngs have it too easy. That's assembly to C and it's C to Java, and it's Java to Ruby. 
If you go back to the rise of C, you can find assembly language programmers lamenting the loss of granular memory management. If you go back to the rise of Python/Ruby, you can definitely find C programmers lamenting the loss of manual memory management. Me currently lamenting all the things you lose from vibe coding -- is it the same thing? Am I the old person yelling at the tides? 

What Kent said

I quote this from Kent Beck all the time: "The value of 90% of my skills just dropped to $0. The leverage for the remaining 10% went up 1000x. I need to recalibrate."

We need to be super careful about what this does to the pipeline

Anecdotally, LLMs are already affecting hiring of junior devs. This is, of course, a problem because junior devs are the leading source of senior devs. 
Anecdotally, I have to imagine that this just feels super-weird to mid-level devs who may or may not have developed enough taste to evaluate the LLM output and who likely feel they have to get great at the tools, like, yesterday or be Left Behind. 
I mean, ever since Y2K, I've always imagined myself coming out of retirement in for One Last Job. But I always assumed it'd be the Unix 2038 issue.

Where the ceiling is is going to matter

I don't know how much better the LLMs are going to get in the next few years. The improvement in the last year or so has been noticeable. Even if the models don't improve, they are likely to get cheaper, faster, and more accessible. 
But it's going to make a big difference if the ceiling is "able to competently write small tools" versus "able to manage large systems", and I just don't know where the endpoint is. 
I don't really have a conclusion to this, other than it will probably all be invalid in, like two weeks.

Comments? I've had to disable them on noelrappin.com, if you want to discuss this post, the URL will be https://buttondown.com/noelrap/archive/what-do-i-think-i-think-about-llms/

Dynamic Ruby is brought to you by Noel Rappin.
Comments and archive at noelrappin.com, or contact me at noelrap@ruby.social on Mastodon or @noelrappin.com on Bluesky.
To support this newsletter, subscribe by following one of these two links:

Monthly Subscription for $3/month

Annual Subscription for $30/year

All opinions and thoughts expressed or shared in this article or post are my own and are independent of and should not be attributed to my current employer, Chime Financial, Inc., or its subsidiaries.

Don't miss what's next. Subscribe to The Dynamic Rubyist:

Start the conversation: