Artanis #21: Zero-to-one (again!)

Building the Stacktrace for AI mistakes - previous updates at

        January 6, 2026

Artanis #21: Zero-to-one (again!)

        Building the Stacktrace for AI mistakes - previous updates at https://artanis.ai/
🙋 Ways you can help - intros to AI teams 🙋
We’d like to speak with people with the following profile:
1/ Technical leader e.g. CTO, Head of AI, Head of Engineering
2/ They have an LLM-based product in the market
3/ Their company is post-revenue, but has fewer than 100 employees
Please get in touch if you know anyone who fits the bill!
💡 New product - Stacktrace for AI mistakes💡
When code fails to run, the Stacktrace shows exactly where in the codebase the problem originated. This makes it much faster for engineers to fix software mistakes, as they don’t need to dig through every line of their codebase. Stacktracing is a critical part of the software development workflow.
Fixing AI mistakes is much harder, as they’re not caught by the compiler. Instead, engineers find out about AI mistakes because of human feedback. This means that people - either users or internal team members - complain about the output. It’s time-consuming to find the precise cause of the mistake, as it could be:

1/ In an intermediate step of a complex pipeline. Engineers need to inspect the input/output in every step of their pipeline to figure out which step caused the problem.

2/ The mistake was in the data being given to the AI, such as out-of-date docs. In this case, engineers need to dig through the raw input data rather than their pipeline’s inputs/outputs.

3/ The person complaining was wrong and the AI was right. In this case, engineers often chase their tail looking for an AI problem that doesn’t exist.
We’re solving this problem by building a “Stack trace for AI mistakes”. When people complain about AI output, we’ll automatically determine the root cause of the mistake from the buckets above. If it’s an AI problem, rather than a human mistake, we’ll also identify the specific part of the pipeline that caused it. This will free up engineering time to focus on fixing the AI problem, rather than digging through data.
How did we get here? See the next section if you care for the details!
🔬 Customer discovery - the state of teams building AI products 🔬
We started December with three hypotheses about companies building AI products. Here’s what we learned over the course of 35 discovery calls.
1/ There’s a strong link between their AI accuracy and revenue.
This was correct, but framed incorrectly. AI accuracy is a priority for most teams, but none could quantify a direct link to revenue. Instead, the focus on accuracy is because of concerns such as churn, winning pilots, brand and legal compliance. Most of these are linked to revenue, so we’re happy to move forward with helping teams improve accuracy.
2/ The main bottleneck to better accuracy is a lack of robust measurement (or “evals”).
Not usually - instead, the most commonly cited bottleneck was lack of engineering resource. While most teams think evals are important, they aren’t usually a current priority. The main challenge in writing better evals is the relationship with the domain expert, who often doesn’t have an incentive to put time into this or awareness of its importance. The solution to this challenge is cultural, rather than software.
3/ Existing monitoring solutions can measure inputs and outputs, but not accuracy.
This was correct. There’s no dominant observability solution: a plurality (but not a majority) of teams build in-house. There’s also a long tail of 3rd party solutions (e.g. Langsmith, Langfuse), which teams often buy in the hope they’ll fix their problems with evals. However, we didn’t meet any teams where this worked in practice - they just use the I/O monitoring features and reverted to managing their evals separately.
📉 Progress in December - zero-to-one! 📈
Our metrics for December were:
Monthly revenue: $300
Customers: 1

Signing the first customer for a new product within a month was a big win! This was particularly fast, given the run-up to Xmas is usually poor timing for closing deals. It also feels repeatable: we had good results with our outbound motion, booking 39 discovery calls in 3 weeks through a mix of warm intros and cold calling. This gives us confidence to commit further to our current route.
🏹 Goal for January - grow from 1 to 2 customers 🏹
We’re aiming to grow from 1 to 2 customers. To do so, we’ll need to both i) retain our first customer, and ii) acquire a new customer. Our main hypotheses are now changing to:
1/ Actioning human feedback on AI output quality is a significant resource commitment. 
If this is true, teams will pay for solutions that save them time.
2/ Our “Stacktrace for AI mistakes” makes it faster for AI teams to action human feedback.
There are two parts to this: i) we’ll sign new deals if prospects believe it, and ii) we’ll retain existing customers when we’ve deployed this in practice.
🙏 Shout-outs 🙏
Special thanks for December go to:
Chidi W - for being customer #1
Lorenzo S - for being an intro MVP
Ollie T - for a good chat on buttons vs text input!
Aleks M - for the quote about gradients
Zahid M - for the intro
Sunny Z - for the chat on compliance
Jeff K - for the industry insights
Henry M - for the deep dive into your website time machine
David S - for the legal tech call
Laura R - for the extra hustle
Sergey C - for being such an avid update reader!
Nick E - for providing some pre-LLM perspective
Ben L - for the intro to Pravin
Jeylani J - for the look into where CS agents get tricky
Tom V - for the AI grading catch-up 
David R - for holding us accountable
Nisarg M - for the Raft intro

En Taro Tassadar,
Artanis Team

                            Don't miss what's next. Subscribe to Artanis Monthly Updates:

            Email address (required)