As an Individual contributor working across multiple streams. I work more with LLMs than humans. This work relationship is analogous to a manger and their direct report.
From my experience of collaboration for the past few months, I think the performance was sub-par. I had to be intentional about my usage of these LLMs to deliver some valuable output.
I have been privy to managing people but LLMs are different. Methods like teaching and training aren’t effective. Instead, you need to think in terms of memory and context.
My super boss uses the frame of young 22 year olds(eager to do the job) for working with LLMs. I like to use the same framing because there are more than 1 LLM context windows we are using to get any task done.
So, I am going back to the principles of managing laid out by Andrew S Grove in “High Output Management”.
Lets start with the most important task of a manager.
“The single most important task of a manager is to elicit peak performance from his/her subordinates”
When working with LLMs as their manager, you end up spending a lot of time delivering the right context and praying that you didn’t run out of memory in the chat window. If the context is corrupted, output immediately fails. And if memory runs out, random gibberish is surfaced. In both cases, the evaluation of the performance requires a slew of tools(prompts for testing) that you need to custom built.
Take my role as a product manager. If I was managing other product managers and we are tasked with writing a new bet (feature) document. I would task them to look at previous bets to see how we did it the past. Then, I would deliberate with them about the problem we are solving. Finally, I will ask them to take a stab at writing the bet .
In order to replicate this with LLMs, I will have to write an essay length system prompt first. I will specifically instruct it to not consider my previous documents as context but for process of memory. Inspite of this, the output could be prone to error. The challenging part between both situations is in one the work will come back in days or weeks but in the other it takes minute. So, evaluating output of LLMs becomes taxing.
“A manager has two ways to improve performance 1) training; 2) motivation”
Neither of these work for LLMs. As a user of foundation models like Gemini, Open AI or claude we are only priming the models with our own context using custom RAGs(Retrieval-augmented generation). Moving to the point of motivation, no human is as motivated as LLMs to impress. When asked they respond, when tasked they attempt and when questioned they reply even if they botch it up completely.
So, clearly the managing principles are failing at the face of it. For the first time, we have reports that produce work, copious amounts of it for every task we ask. The job is to evaluate and become more intentional and direct.
Managing Humans vs LLMs
How does my job changes when working with LLMs?
Going back to the bet scoping example. I now dump all the messages where I thought about the feature into a single window. I ask it to generate an output basing on the templates I have given for the bet documentation. I then take those outputs and run them through an another set of system prompts which tell me what am I missing in terms of writing a good bet. I take the output and ask them to be added in the initial bet document and then spend time going back and forth. Keep iterating till I have an output that can be presented to the team. All of this can be done in a single session of couple of hours unlike days with humans.
Yet, sometimes waiting on problems is better. While humans are capable to do it, LLMs are just waiting on you to re-start the window with a new message in the same context window.
My own maturity on using these tools has improved immensely in the last few months but I am far off from being a good manager. How to manage LLMs book is being written somewhere and I look forward to getting my hands on it.
Round up
My gripe with chat interfaces has found words from more qualified folks. I think in legacy domains like logistics and agriculture where the front line workforce does the essential work but has no incentives to change- chat is the wrong medium.
This piece by Julian lays out why chat as an input is a step backwards.
LLMs don’t solve this problem. The quality of their output is improving at an astonishing rate, but the input modality is a step backwards from what we already have. Why should I have to describe my desired action using natural language, when I could simply press a button or keyboard shortcut? Just pass me the goddamn butter
An example if it becomes successful comes in the form of this company post documenting how data analysis function will be subverted. The purpose of AI is to generate more insights that can be acted upon.
This creates a "barbell" structure rather than a pyramid. At the bottom, engineers build and maintain the data infrastructure that feeds AI systems. In the middle, AI handles insight generation, which gets productized rather than being human-intensive. At the top, more decision-makers consume insights directly and act on them.
The "vast middle" gets automated and productized. The human roles that remain are either highly technical (data engineering) or highly strategic (decision-making). The interpretive layer in between becomes software.
Ellen Köenig is a fellow commoncog member who I happened to meet recently. It was interesting conversation spanning multiple topics. This post talks about her previous life of being a data engineer. I have benefited immensely from putting the advice to practice.
Sign off
Not having RSS feed reader on my phone is the reason why only 1 recommendation this week. It has led to complete collapse of my daily reading of blogs.
I am now working on putting that time to good use. Whom I am kidding, I have failed miserably so far.
Hoping to write more in the coming months with my new found time.