7 LLM inference APIs, ranked for agent use. Groq still wins -- but not by much.
We just added 25 more tools to the registry -- hitting 75 rated. This batch focused on LLM inference APIs, because that's where the biggest agent-operability gaps are hiding.
Here's the ranking for the 7 LLM API providers we scored:
Groq -- 84/100 (still #1 for speed) ~500 tokens/second. OpenAI-compatible. Generous free tier. The fastest drop-in for agentic loops where latency compounds -- 10 LLM calls at 500 tok/s vs 100 tok/s is the difference between a 5-second and 25-second workflow.
Google AI (Gemini) -- 84/100 (tied) Gemini 2.0 Flash is legitimately fast and cheap. 1M context window is useful for long document workflows. The API is clean -- not as elegant as OpenAI but functional. Tied with Groq because of the free tier generosity.
Mistral -- 85/100 (the underdog winner) Scores slightly above both because the developer experience is genuinely clean: SDK quality is excellent, pricing is transparent, and Mistral Small is the best cost/quality tradeoff for high-volume agent tasks. European data residency is a bonus for compliance-sensitive workflows.
Perplexity -- 83/100 More niche than the others -- Perplexity's value is the Sonar model with built-in web search. Useful for agents that need real-time information without a separate search step. Scores lower on pure inference because that's not the primary use case.
Cohere -- 82/100 Strong embedding and reranking APIs that agents actually use. Command R+ is solid for RAG pipelines. Slightly lower score because the developer onboarding is more friction than Groq/Mistral.
Cerebras -- 81/100 Extremely fast (900+ tok/s on Llama 3) but rate limits are tight on the free tier. Narrow use case -- best for ultra-latency-sensitive loops. Not the default choice but the right one when you need raw throughput.
OpenAI -- 85/100 (still the benchmark) The reference implementation. Every other API is OpenAI-compatible because OpenAI defined the standard. Scores 85 not 90+ because of rate limit friction, no free tier, and slower inference vs Groq/Cerebras.
The insight: For agentic use cases, the "best" LLM API isn't the most capable model -- it's the one that minimizes latency and maximizes free-tier experimentation. Groq and Mistral win on both dimensions.
Also new this week:
- Cal.com -- 86/100 vs Calendly -- 55/100: Cal.com's open-source nature means full API access and self-hosting. Calendly locks down API access, requires OAuth for basic reads, and has no MCP server. The gap is wider than you'd expect.
- 3 new category pages:
Give your agent access to all 75 ratings:
claude mcp add --transport http agent-native-registry https://agentnativeregistry.com/api/mcp
Or in JSON config:
{
"mcpServers": {
"agent-native": {
"command": "npx",
"args": ["-y", "mcp-remote", "https://agentnativeregistry.com/api/mcp"]
}
}
}