EVAL #004: AI Agent Frameworks — LangGraph vs CrewAI vs AutoGen vs Smolagents vs OpenAI Agents SDK
Every week there's a new AI agent framework on Hacker News. The GitHub stars pile up, the demo videos look magical, and six months later half of them are abandoned. We're deep enough into the agent era now that the field is consolidating. Five frameworks have emerged as the ones that actually matter — not because they're perfect, but because they have real users, active maintainers, and distinct design philosophies.
This issue breaks down LangGraph, CrewAI, AutoGen, Smolagents, and the OpenAI Agents SDK. No hype. No "it depends on your use case" cop-outs. Honest assessments of what each does well, where each falls short, and which one you should pick for what.
Let's get into it.
The Quick Comparison
| Framework | GitHub Stars | Latest Version | Last Release | Philosophy | Best For |
|---|---|---|---|---|---|
| AutoGen (MSFT) | 55.6K | v0.7.5 | Sep 2025 | Multi-agent conversation | Research, complex multi-agent |
| CrewAI | 46K | v1.10.1 | Mar 2026 | Role-based agent teams | Business workflows, rapid prototyping |
| LangGraph | 26.3K | v1.1.2 | Mar 2026 | Stateful agent graphs | Production systems, complex control flow |
| Smolagents (HF) | 26K | v1.24.0 | Jan 2026 | Code-first, minimal | HF ecosystem, code agents |
| OpenAI Agents SDK | 20K | v0.12.1 | Mar 2026 | Lightweight, opinionated | OpenAI-native apps, fast shipping |
Star count doesn't tell the whole story here. AutoGen leads in stars but hasn't shipped a release in six months. OpenAI Agents SDK has the fewest stars but is shipping multiple releases per week. Velocity matters more than vanity metrics.
Per-Framework Analysis
LangGraph — The Production Workhorse
Stars: 26.3K | Latest: v1.1.2 (Mar 12, 2026) | Lang: Python, JS
LangGraph is what you reach for when you need agents that actually work in production. Built by the LangChain team as the answer to "LangChain is too messy for real applications," it models agent workflows as stateful, cyclical graphs. Nodes are functions. Edges are conditional transitions. State is explicitly managed and persisted.
The good stuff: LangGraph gives you fine-grained control over every step of agent execution. You define exactly how state flows, when the agent loops, and where human-in-the-loop checkpoints go. The persistence layer is genuinely useful — you can pause an agent mid-execution, serialize its state, and resume it days later. LangGraph Platform provides deployment infrastructure with streaming, background tasks, and cron support. The v1.1 line has been stable and actively maintained.
The honest problems: The learning curve is real. If you want a quick agent prototype, LangGraph is overkill. The graph abstraction forces you to think in nodes and edges even when a simple loop would do. It's still coupled to the LangChain ecosystem more than it should be — you'll pull in langchain-core whether you want to or not. Documentation has improved significantly but the API surface is large.
Verdict: The right choice for teams building production agent systems that need reliability, observability, and control. Not the right choice for a weekend hackathon.
CrewAI — The Popular Kid
Stars: 46K | Latest: v1.10.1 (Mar 4, 2026) | Lang: Python
CrewAI exploded in popularity because it nailed the mental model: define agents with roles, give them goals, organize them into crews, and let them collaborate. It's intuitive in a way that most frameworks aren't. You think "I need a researcher agent and a writer agent," and CrewAI maps directly to that.
The good stuff: Fastest time-to-working-prototype of any framework on this list. The role/goal/backstory agent definition is surprisingly effective at shaping LLM behavior. Built-in support for sequential and hierarchical task execution. The enterprise platform (CrewAI+) adds monitoring, deployment, and team features. The v1.10 release added Gemini GenAI upgrades, A2A (Agent-to-Agent) protocol support for Jupyter environments, and MCP tool loading improvements. Very active development cadence.
The honest problems: CrewAI abstracts away too much for complex use cases. When agents need to share nuanced state, or when you need precise control over conversation flow, you hit walls. The "crew" metaphor breaks down for workflows that aren't neatly decomposable into sequential tasks. Performance can be unpredictable — agents sometimes have circular conversations that burn tokens without progress. The gap between the open-source core and the enterprise platform keeps growing, which is a strategic risk for users.
Verdict: Best choice for business-oriented teams that need to ship agent workflows quickly and can live with some loss of control. Think "AI-powered content pipelines" or "automated research assistants."
AutoGen — The Research Giant
Stars: 55.6K | Latest: v0.7.5 (Sep 2025) | Lang: Python, .NET
AutoGen is Microsoft's entry, and it has the most stars of any agent framework on GitHub. The core idea is multi-agent conversations: agents talk to each other (and to humans) in structured or freeform dialogue to solve problems. Microsoft Research built it, and it shows — the design choices favor flexibility and experimentation over production simplicity.
The good stuff: The most sophisticated multi-agent conversation patterns of any framework. Supports complex topologies — group chats, nested conversations, teachable agents. The v0.7 rewrite (AutoGen 0.7/AgentChat) cleaned up the architecture significantly with an event-driven, distributed runtime. .NET support is a differentiator for enterprise shops. Thinking mode support for Anthropic models landed in the latest patches. RedisMemory now supports linear memory patterns.
The honest problems: Here's the elephant in the room — the last release was September 2025. That's six months of silence as of this writing. For a framework in a space moving this fast, that's concerning. The 0.4-to-0.7 transition created ecosystem fragmentation and confused users. The API is powerful but verbose. Setting up a basic two-agent conversation requires more boilerplate than it should. The docs assume you're already comfortable with distributed systems concepts.
Verdict: If Microsoft recommits and ships 0.8, AutoGen could be the most capable framework on this list. Right now, the release stall puts it in "watch and wait" territory for new projects. Existing AutoGen users should evaluate alternatives for greenfield work.
Smolagents — The Minimalist
Stars: 26K | Latest: v1.24.0 (Jan 16, 2026) | Lang: Python
Hugging Face's Smolagents takes the opposite approach from everyone else: agents should write and execute code, not chain together API calls. The name says it all — this is a small, focused library that doesn't try to be a platform. Agents receive tools, reason about them, and produce Python code to orchestrate their use.
The good stuff: The code-generation approach is genuinely interesting and often more efficient than ReAct-style tool calling. Less abstraction means fewer surprises — you can read the entire codebase in an afternoon. Tight integration with the Hugging Face ecosystem (models, datasets, Spaces). Multi-agent support via ManagedAgent. The v1.24 release added backward compatibility for the deprecated HfApiModel and expanded the model support list. No vendor lock-in — works with any LLM that can generate code.
The honest problems: The "agents write code" paradigm is powerful but risky. Sandboxing is essential and the secure execution story is still maturing. Fewer guardrails mean you need more LLM skill to build robust agents. The release cadence has slowed — Jan 2026 was the last release, which is a two-month gap. Community is smaller than the bigger frameworks, so you'll find fewer tutorials and examples. The "smol" philosophy means features you might want (complex state management, built-in persistence, deployment tooling) simply don't exist.
Verdict: Perfect for ML engineers already in the HF ecosystem who want code-executing agents without framework bloat. Not the right fit if you need enterprise features or a batteries-included experience.
OpenAI Agents SDK — The New Challenger
Stars: 20K | Latest: v0.12.1 (Mar 13, 2026) | Lang: Python
Launched in early 2025 as the successor to Swarm, the OpenAI Agents SDK is the youngest framework here and it's moving fast. Very fast. Three releases in the last five days as of this writing. The design philosophy is clear: provide just enough structure for multi-agent workflows without the complexity overhead.
The good stuff: Clean, minimal API surface. Agents, handoffs, guardrails, and tracing — that's basically the whole framework. Built-in support for OpenAI's tool calling, function calling, and model capabilities without adapter layers. The v0.12 release added opt-in retry settings for model API calls, which is a sign of production-minded thinking. Human approval flows got improvements in v0.12.1 with preserved rejection messages across resume flows. Excellent if you're building on OpenAI models anyway.
The honest problems: The vendor lock-in is real. Yes, you can use it with other providers via LiteLLM integration, but it's designed around OpenAI's API shape. Still pre-1.0, which means breaking changes are expected and happening frequently. The pace of releases (12 minor versions in ~12 months) means the API surface hasn't fully stabilized. Less mature than LangGraph or CrewAI for complex orchestration patterns. No built-in persistence or state management beyond what you implement yourself.
Verdict: If you're building on OpenAI models and want to ship fast with minimal ceremony, this is your framework. The release velocity is a strong signal of commitment. Watch for 1.0 stabilization before betting production systems on it.
The Recommendation Matrix
"I need production agents yesterday" → LangGraph. Most mature, best persistence story, most control.
"I need a prototype by Friday" → CrewAI. Fastest from zero to working multi-agent system.
"I'm an ML engineer who wants code agents" → Smolagents. Minimal, code-first, no bloat.
"My company is an OpenAI shop" → OpenAI Agents SDK. Thinnest abstraction over the models you're already using.
"I need sophisticated multi-agent research systems" → AutoGen, if you can accept the release uncertainty. LangGraph as the safer alternative.
"I'm building for enterprise with .NET" → AutoGen is literally your only real option here.
"I want maximum flexibility and don't mind complexity" → LangGraph. The graph abstraction can model anything.
"I just want the simplest thing that works" → OpenAI Agents SDK or Smolagents. Both optimize for minimal abstraction.
The Changelog
What shipped this week and what matters:
- LangGraph v1.1.2 (Mar 12) — Remote graph API context support, stream part generic ordering fix. Stability-focused release.
- LangGraph CLI v0.4.16 (Mar 12) — Tooling improvements for LangGraph Platform deployment.
- OpenAI Agents SDK v0.12.1 (Mar 13) — Preserved approval rejection messages across resume flows. Small but important for human-in-the-loop patterns.
- OpenAI Agents SDK v0.12.0 (Mar 12) — Opt-in retry settings for model API calls via ModelSettings. Production-readiness signal.
- CrewAI v1.10.1 (Mar 4) — Gemini GenAI upgrade, A2A Jupyter support, MCP tool loading fixes, thinking model output surfacing.
- CrewAI v1.10.2a1 (Mar 11) — Alpha with upcoming features. Shows active development pipeline.
- AutoGen v0.7.5 (Sep 2025) — Anthropic thinking mode, RedisMemory linear memory, Bedrock streaming fixes. Last release six months ago.
- Smolagents v1.24.0 (Jan 2026) — HfApiModel backward compat, expanded model support list. Last release two months ago.
The pattern is clear: LangGraph and OpenAI Agents SDK are in active rapid development. CrewAI is shipping steadily. AutoGen and Smolagents have slowed. In a space evolving this fast, release velocity is a leading indicator of framework health.
The Signal
1. Agent-to-Agent (A2A) protocol is becoming table stakes. CrewAI's v1.10 added A2A support. Google launched A2A as an open protocol. This is the early standardization phase — frameworks that don't support inter-agent communication across boundaries will get left behind. Watch for LangGraph and OpenAI to announce A2A support in the next quarter.
2. The "MCP + Agents" stack is consolidating. Model Context Protocol (MCP) for tool integration and agent frameworks for orchestration is becoming the default architecture. CrewAI now loads MCP tools natively. Smolagents has had MCP support. This layering — MCP for tool plumbing, agent framework for orchestration — is likely the pattern that wins.
3. Microsoft's agent strategy is fragmenting. AutoGen's release stall coincides with Microsoft pushing Copilot Studio, Semantic Kernel, and Azure AI Agent Service. The question isn't whether Microsoft cares about agents — they obviously do — it's whether AutoGen continues to be the OSS vehicle for that strategy or gets quietly sunset in favor of proprietary tooling. If you're evaluating AutoGen for a new project, this ambiguity is a real risk factor.
The agent framework space is maturing. The build-vs-buy decision is getting clearer: pick the framework that matches your control needs and ship. The frameworks that survive will be the ones that find the right balance between abstraction and control, between developer experience and production readiness.
Stop evaluating. Start building.
— EVAL
EVAL is a weekly newsletter covering AI engineering tools, frameworks, and practices with zero fluff.
Subscribe at buttondown.com/ultradune to get EVAL in your inbox every week.
Read past issues and explore the research at github.com/softwealth/eval-report-skills
If this was useful, forward it to someone who's drowning in agent framework choices.