Managing Tasks and Reviewing Code

whole development methodology

        February 24, 2026

Managing Tasks and Reviewing Code

        Hi! Apparently this email was only published, but never sent out before I went on holiday. So this is not super-fresh anymore. Stay sharp for a next newsletter in a few days!
Task tracking across many contexts
Agents now handle longer, more complex tasks. 
In Claude Code this is made possible by the planning phase (a markdown file that guides the agent) and the Tasks tool (TODOs with dependencies).
But how can one handle tasks that are longer than one context window? We need other tools! 
People work around this manually: write the plan to a file, complete one task, delete it, start a fresh agent and repeat the process.
This works but is quite tedious, so David Cramer (Co-Founder of Sentry) wrote dex, a task tracking plugin for agents. 
It supports:
Persistent tickets that survive beyond the current session, stored as files in the repo.
Syncing to Github Issues.
I used it for 2 features and liked it. 
Planning and building a comprehensive backlog of many tasks, before starting any implementation, worked quite well
A related approach is Compound Engineering by every.to. 
I tried it for one feature and it’s really very comprehensive. 
That is, it’s a whole development methodology for using agents (mainly Claude Code) for developing actual production software. 
It sets up four workflows: plan, work, review and compound. 
The last one is the special step: it’s purpose is to update the documentation of the software, to make sure future agents will write code that fits into the architecture.
The four steps fit well into each other but burn a lot of tokens. 
For example, the review step includes 12 subagents for different perspectives (performance, security, over-engineered abstractions,…).
This captures the downside: using Compound Engineering is involved, time consuming, and I would only do it for meaningful features. 
Small, one-off changes don’t make sense for it.

Other Resources, Articles, Releases
Warden by Sentry is an open source CLI to AI based code review. Basically, an alternative to all the startups in this space (e.g. Greptile). It lets you run an LLM to check the new changes against some agent skill (see last newsletter) like vercel-react-best-practices.
You have probably heard of OpenClaw (before: ClawdBot), a self-hosted digital, personal assistant (agent) based on any LLM like Opus 4.5. I haven't installed it, so I'll be brief. But it’s on steroids. People give it access to the internet, their emails, files, and all kinds of APIs. With all this and a powerful model, it can accomplish real things: tasking the bot to find and negotiate a used car. The agent used the internet to find relevant cars, and then contact the dealerships via email and actually negotiated the price. For anyone trying it: don’t install it in your network, and don’t give it too much access — the disaster is waiting to happen.
Anthropic:
In a randomized controlled trial, we examined 1) how quickly software developers picked up a new skill (in this case, a Python library) with and without AI assistance; and 2) whether using AI made them less likely to understand the code they’d just written. We found that using AI assistance led to a statistically significant decrease in mastery. On a quiz that covered concepts they’d used just a few minutes before, participants in the AI group scored 17% lower than those who coded by hand, or the equivalent of nearly two letter grades. Using AI sped up the task slightly, but this didn’t reach the threshold of statistical significance.
Importantly, using AI assistance didn’t guarantee a lower score. How someone used AI influenced how much information they retained.
In a way, this was very obvious. Still, it is a good reminder for us to not just click “accept all edits” and move on once the code seems to work. To actually become skillful, we really need to force ourselves to understand, study and learn.
Peter Steinberger, creator of OpenClaw, podcast at The Pragmatic Engineer, some interesting bits:
He swears by GPT Codex: it “thinks” for much longer and comes back with a better solution than Opus 4.5 could.
Because Codex takes longer and doesn’t constantly ask for clarifications (like Opus 4.5), he is able to deploy parallel agents.
He no longer reads every line of code, but spends a lot of time thinking about architectural decisions (which he is able to think through, because of his vast software experience).
antirez, the creator of Redis, is big on LLM assisted coding. He makes a important distinction between just vibecoding, and the other way: the engineered, planned and vision-led approach to AI programming. Unfortunately, this second way does not have a catchy name yet, but he tries to establish the name automatic programming:
if vibe coding is the process of producing software without much understanding of what is going on […], automatic programming is the process of producing software that attempts to be high quality and strictly following the producer's vision of the software […], with the help of AI assistance. Also a fundamental part of the process is, of course, what to do.

                                Don't miss what's next. Subscribe to This Week in Vibecoding:

            Email address (required)