Hello World!
Hello World!
Preamble: There is no such thing as Artificial Intelligence
I am not going to be able to write about language models without getting this out of the way, so bear with me here.
AI is not a real thing and the people selling it are lying to you. It never has been and I sincerely doubt that it ever will be. The people that claim to the world (and investors) that they are building "AI" are just awful (seriously, read this paper), and probably the last people I would trust to build anything important (as an aside, they have, however, actually built something that I, as someone who was made directly homeless by the 2008 collapse, consider the single most horrifying financial bubble that I have ever seen.) What these bizarre cultish cretins have built and sold as Artificial Intelligence is something called Large Language Models (LLMs).
Large Language Models are not intelligent, artificial or otherwise. They are pieces of software that do math to predict the statistically most likely next token (a chunk of text that's 3-4 characters long on average) to come after a given input. That's it. This can make them decently serviceable text generators for some (very narrow) purposes. Since truth is not, as a concept, something that can be derived from statistical likelihood, they are amazing at creating bullshit. In a sense they only create bullshit, just to varying degrees of usefulness depending on compositionality and logarithmic space. I'm just kidding, it's luck. The usefulness of the bullshit relies on luck. You pull the lever on the Bullshit Slot Machine and hope for the best.
All that aside, I actually do think that LLMs are kind of neat, technically. They're improving incrementally (the slot machine keeps feeling a little nicer) for my particular use case, and I could see a future where they might actually be generally useful for a small subset of tasks (though I could also easily see a future where they do not.) The biggest barrier to that improvement is, unsurprisingly, the companies that initially popularized them.
The Two Types of "AI" Companies
There is the kind that focuses on making language models more addictive, and there is the kind that focuses on improving the underlying technology. One type keeps their models on their own servers and charge exorbitant amounts of money to use them, and the other gives you their models for free, to use and modify however you want on your own hardware. One group competes for hearts and minds (and regulatory capture) and the other group literally just competes to make their models produce more useful output while demanding fewer resources to run. This is the fundamental difference between what is happening in America and what is happening in China, and I've been having a blast following the whole saga for the past ~18 months.
I first heard about all of this after chipmaker Nvidia lost over half a trillion dollars (trillion with a T!) in market value when a model called DeepSeek was released. DeepSeek was made in China, which doesn't have the easy access to Nvidia's newest hardware. They didn't use Nvidia's state-of-the-art hardware to make it, and they gave it away for free, meaning the model didn't need Nvidia's fancy chips to use it. The most valuable company in the world was propped up by the idea that AI is real, and the future, and most importantly, American.
Ramble about American exceptionalism aside, the part that interested me the most was that DeepSeek gave the model away. You could download it and run it. They couldn't shut it off or modify it or charge you money if you used it. They couldn't even stop you from running it yourself and charging others to use it. You could be your own weird little Sam Altman, king of a dumb "AI" startup and nobody could do a thing about it! I don't personally care to do that, but it is worth noting that the list of things you can't do with OpenAI's models is comparatively enormous.
The difference in approach here isn't just philosophical. OpenAI and Anthropic etc. rely on you sitting in their casinos, pulling the levers on their slot machines, using your real money for every spin. This creates a perverse incentive to keep your butt in the seat that is diametrically opposed to minimizing the number of spins until you get the output that you want. There's a point where improving the software runs counter to their business model. Conversely a company that makes slot machines that you can take home and pull the lever infinite times without giving them any money lacks the incentive to hobble or sabotage it. There's no reason to give your money to a company that will make the product worse and your life worse if they can!
Hold On Real Quick
I need to pause here and say that this newsletter will cover a lot of things about the AI industry, including my practical thoughts about actually using LLMs, which means I've got a few things to make clear right off the bat.
I only use language models for one purpose: writing and troubleshooting code, and even then with specific caveats. I hate writing or debugging JavaScript, and I am also coincidentally very bad at it. I like the idea of making things visually appealing, but every time I try to write CSS I end up getting up and doing literally anything else.
That's it.
I don't ask it questions. I don't use it as a research tool. I don't use it as a search engine. I don't use it to help make decisions. I never share any personal information with it. I don't use it when I write emails or blog posts. I. definitely. don't. just. talk. to. it. (Every word in that last sentence was a different link.) There will never be an improvement in LLM technology that makes me change my position on this. This means that when I say "DeepSeek Pro v4 on high thinking mode is pretty good at planning" I am talking about how the code I want it to generate is organized.
Current Setup
My next post will be much more in-depth about my current setup and how I made my choices. This post is already long, so here is just a quick note about all that, mostly phrased for people that are familiar with using LLMs for this stuff.
As of this writing, I have been using DeepSeek v4 heavily on my side projects and have been happy with the code quality and cost. It is insanely cheap. Like it is actually somehow even cheaper than it looks on the pricing page. The caveat here is that you have to use DeepSeek as the inference provider directly to see the savings because for some reason Openrouter's cache-hit rate was way lower than it should've been when I used it, which led to operations that should've cost < $0.001 (yes I wrote that correctly) ended up costing upwards of $0.08.
For whatever reason, both the Flash and Pro versions seem to work better on High thinking mode rather than Max, but again it is a slot machine, so, you know. Carrying a magic rock in your pocket might also help.
I use Kilo Code as a VSCode plugin, and I've been using Matt Pocock's skills plugins, which has actually been very nice. Kilo doesn't seem to want to stop-start Wrangler's dev server for Cloudflare development on its own, but for some reason one of the skills does that. Not having to manually troubleshoot HTTP 500 error codes is nice.
Ending thoughts
Song of the day: Lana Del Rey - Cola Movie of the week: Over Your Dead Body
Apologies for any spelling errors or poorly-worded sentences, I'm just a human.