Weekly API Evangelist Governance (Guidance) For March 12th, 2026
Following up on last week's newsletter about MCP and the evolution of documentation I have spent a lot of time looking at Agent Skills. Like MCP I am finding ways to get more excited about Agent Skills, despite my skepticism and concern around artificial intelligence at large, but also the potential downside of these very transactional and tactical approaches to defining integrations in a non-deterministic way.
To help me like Agent Skills I see them as little stories. I like stories. I like the readability of skills. I like how they break down the problems. I have lots of worries about their ad hoc, autogenerated, and distributed nature, but your average enterprise is already a mess and if these stories provide little snackable pieces of guidance folks can use in the moment--I can get on board. I don't have any grand delusions of fixing things, and I am working my way through the spec and different providers approach to try and see where I can contribute to make people's lives easier.

The Agent Skills Standard
The Agent Skills specification developed by Anthropic is dominating the discussion right now. The premise is simple: a skill is a folder with a SKILL.md file, a YAML frontmatter block with a name and description, and whatever supporting details the skill needs to accomplish a specific job. That's it. No runtime, no SDK required, no vendor lock-in baked in at the format level. Write once, use everywhere, and based on the eight repos I surveyed, Agent Skills are already being kept across Claude Code, Cursor, Codex, Gemini CLI, Windsurf, and a dozen other IDE and client agent solutions.
What strikes me about the spec is how much it gets right by leaving things out. The description field is the only required piece of intelligence, it's what agents read to decide whether a skill is relevant to the conversation. The body of the SKILL.md only loads after that trigger fires. This two-stage loading pattern is interesting and something every major vendor has adopted. The spec doesn't mandate how you organize references, scripts, or assets, and yet, in practice, every repo has converged on nearly identical conventions. It kind of reminds me of the looseness that Anil Dash recently talked about around why Markdown became successful.

Scripts
Scripts are one of the three core building blocks of a skill, alongside the SKILL.md definition and reference documentation. They live in a scripts/ subdirectory and represent executable code that the agent can invoke directly, Python for data work and API calls, Bash for deployment and system operations. The key insight here is that storing scripts in the skill prevents an agent from having to re-derive the same logic from scratch every time a user asks for it. OpenAI is the most explicit about this, noting that deterministic code should be preserved and not regenerated, their Sora skill, for instance, carries a comment that the script should "never be modified unless the user asks." This is something I'd like to see emulated across other aspects of a spec-driven bundle--drawing a deterministic and non-deterministic line.
Vercel adds a layer of convention on top of Bash scripts: shebang required, set -e for fail-fast behavior, status output to stderr, JSON output to stdout, cleanup traps for temp files. These aren't arbitrary rules, they make scripts predictable across the different sandboxed environments (local Claude Code, claude.ai, Codex) that a single skill might run in. More grounding.

References
References are the knowledge layer of a skill, supporting documentation that gets loaded on demand rather than stuffed into the main SKILL.md. They live in a references/ or content/ subdirectory and cover deep dives, framework-specific guides, API catalogs, known limitations, and anti-pattern documentation. The progressive disclosure model is key: the skill's main file stays under 500 lines, and the agent pulls in reference material only when a specific workflow or edge case demands it--keeping token consumption reasonable while still providing deeper access when it matters.
Cloudflare has taken references further than anyone else, their main cloudflare skill alone contains 63+ reference subdirectories covering the full breadth of the platform, from Workers and D1 to WAF, Stream, and the AI Gateway. That's 341 reference markdown files, organized into categories for compute, storage, AI, networking, security, and developer tools. Postman went in a different direction with just two reference files, but made them count: mcp-tools.md catalogs all 110+ available Postman MCP tools. It is interesting how everyone is using folder structure to organize them, and providing decision trees, but I wonder how this will work at higher levels.

Assets
Assets are the third tier of skill supporting material, templates, icons, brand files, and pre-built output formats that an agent can use without loading into context. They live in an assets/ directory and are meant to be applied or output directly rather than read and interpreted. The distinction from references is important: references are documentation an agent reads to understand how to do something, while assets are files an agent uses to produce something. Anthropic's theme-factory skill, for example, bundles 10 pre-built theme definition files in a themes/ subdirectory, the agent doesn't need to understand every line of a theme file, it just applies one.
Microsoft's use of assets is the most production-ready of any repo. Their Azure SDK skills include assets/ directories with ready-to-use code templates, client_template.py, service_template.py, conftest_template.py, that agents can drop into a project without modification. This is a meaningful shift in how you think about code generation, and rather than asking an agent to derive patterns from scratch every time, you give it a correct, production-grade template and let it adapt, reducing hallucination risk and produces more consistent output. OpenAI's approach to assets is similar in allowing icons and brand color in agents/openai.yaml for UI rendering, keeping presentation concerns completely separate from the agent-facing SKILL.md.

Anthropics Skills
Anthropic's own skills repository tells an interesting story about how they see skills. The 17 skills span document processing (PDF, DOCX, XLSX, PPTX), creative work (algorithmic art, theme-factory, canvas design), enterprise communication, and technical development, but what makes the repo compelling as a reference isn't the individual skills, it's the two meta-skills. The skill-creator skill runs to over 2,100 lines and walks through a four-phase process: capture intent, interview the user, write the skill, test it. I like meta-skills. The guidance is to avoid ALWAYS and NEVER in all caps, explain the reasoning behind constraints, prefer imperative form, and challenge every sentence. The claude-api skill is another one that stands our for me by providing a single skill branching into Python, TypeScript, Go, Java, PHP, Ruby, C#, and curl, showing how the format scales beyond simple single-purpose cases. Although I don't quite get the language separate, but I'm guessing it is me. The marketplace.json groups skills into logical bundles rather than listing them flat, which is a different distribution model than the other repos are using.

Cloudflare Skills
Cloudflare's approach is the most sophisticated of any repo I looked at (like their MCP approach). They're the only team to formalize three distinct artifact types: Skills (auto-loaded from conversation context), Commands (user-invoked slash commands like /cloudflare:build-agent), and Rules (file-glob triggered when specific file patterns are present). That three-way split maps to real differences in how an agent acquires knowledge, passively, explicitly, or contextually. The "prefer retrieval over training" philosophy running through every skill is equally deliberate: the platform moves fast, and rather than pretending pre-training data is current, the skills direct agents to fetch live documentation and lean on five remote MCP servers (cloudflare-api, cloudflare-docs, cloudflare-bindings, cloudflare-builds, cloudflare-observability). With 341 reference files covering the full platform breadth and a main cloudflare umbrella skill built around decision trees, this is less a skill and more a live knowledge base an agent can navigate on demand. This approach is compelling, even though I feel is complicated, and likely to make it less modular and composable with other providers.

Microsoft Skills
Microsoft's skills repo is operating at a different scale than everyone else, 132 skills across Python, TypeScript, .NET, Java, and Rust, organized into a symlink tree so the same skill is reachable via both its canonical path and language-category navigation paths like skills/python/data/cosmos-db. The package frontmatter field is unique to Microsoft and names the exact pip/npm/nuget/maven package to install, removing any ambiguity about dependency management. What really separates them is the quality infrastructure: 1,158 YAML test scenarios across 128 skills with prompts, expected patterns, forbidden patterns, and mock responses; acceptance criteria files documenting correct and incorrect code patterns like a code review rubric; and a docs/llms.txt at a predictable path linking 142 skills with a companion llms-full.txt that inlines everything. The deep-wiki meta-plugin rounds it out by generating Mermaid diagrams, onboarding guides, and VitePress documentation sites from a repository's structure, a plugin for producing the agent-ready documentation layer of a codebase, which speaks to the documentation work I've been doing recently.

OpenAI Skills
OpenAI's skills repository is the most token-conscious of the bunch, and they're explicit about why, the skill-creator skill frames the context window as "a public good shared with conversation history, system prompts, and other skills," and every sentence should be challenged accordingly. This produces leaner skills, enforced structurally by a no-auxiliary-docs rule: READMEs, changelogs, and installation guides are not allowed in skill directories. The cleanest design choice here is the two-file metadata split: SKILL.md is purely for agent consumption (triggering, workflows, decision logic), while agents/openai.yaml handles everything product-facing, display name, UI description, icons, brand color, a default_prompt that must reference the skill name via $skill-name syntax, and MCP server dependencies with transport type and URL. The three-tier hierarchy (.system/ always loaded, .curated/ installable, .experimental/ in progress) gives the repo a lifecycle model that other repos don't have -- providing me with a blueprint that I can recommend elsewhere.

Postman Skills
Postman's skills repo is the smallest in the survey, two skills, six files, no build system, no CI, no manifests. The postman skill covers the full API lifecycle across seven workflows, and the postman-api-readiness skill introduces a scoring framework for a question the industry hasn't formalized yet: can an AI agent reliably use this API? The answer involves 48 checks across eight pillars, a weighted scoring formula, and a 70% threshold verdict. The allowed-tools frontmatter field is the most interesting innovation here and I haven't seen it anywhere else, each skill explicitly whitelists which tools it can invoke (Bash, Read, Write, Glob, Grep, plus glob pattern matching for MCP tools via mcp__postman__*). Paired with mcp-limitations.md, which documents known bugs in the Postman MCP server with explicit workarounds, you get a skill that's transparent about what it can do and what it cannot do, a combination the broader ecosystem should adopt.

Speakeasy Skills
SpeakEasy has built the most test-driven skills repository of the group, the evals/ directory contains a Python-based evaluation harness using the Claude Agent SDK, YAML test suites covering activation, correctness, completeness, hallucination detection, and workflow validation, and fixture OpenAPI specs for realistic inputs, all documented in a dedicated SKILL_EVALUATION_FRAMEWORK.md. The speakeasy-context meta-skill is also worth calling out: at just 31 lines, it acts as a workflow bookend that every other skill inherits, wrapping each task with speakeasy agent context before and speakeasy agent feedback after. The overlay-first philosophy runs equally clearly through the whole repo, never modify a source OpenAPI spec directly, and the diagnostic skills back it up with decision framework tables that map issue types (naming, structural, design) to specific fix strategies rather than leaving that routing to the agent's judgment. SpeakEasy shared on the podcast the other day about how they have a Slack channel for agents to bitch and complain about issues they encounter.

Vercel Skills
Vercel has built the most engineered infrastructure of any skills repo in the survey. The packages/react-best-practices-build/ TypeScript package compiles 107 individual rule files, each with YAML frontmatter declaring title, impact level, and tags, into a single generated AGENTS.md via a pipeline. The individual rule files are the single source of truth; the generated output is a distribution artifact. Impact scoring at the rule level (CRITICAL, HIGH, MEDIUM, LOW) means an agent reviewing a React component gets a prioritized list, not just a list. The deploy-to-vercel skill adds environment-aware execution with explicit branching for Claude Code (local terminal), claude.ai (sandboxed), and Codex (custom sandbox), because Vercel is building skills for production deployment scenarios where environment differences are real, and the engineering reflects that.

Categories of Skills
Across the eight repos, a handful of categories appear repeatedly: cloud platform deployment and tooling, API development and testing, AI/ML model training and inference, document processing, frontend and React development, and meta-skills for building more skills. The cloud platform category is the deepest, Cloudflare alone has skills covering compute, storage, networking, security, media, and AI services, but API tooling is arguably where the most interesting work is happening. SpeakEasy and Postman are both building skills specifically oriented around OpenAPI and SDK generation workflows, which is a signal that the API lifecycle is becoming a first-class use case for agent capabilities, and reflects my desire for all APIs.json properties become Agent Skills.
The meta-skill category really gets me imaginging what is possible. Anthropic has skill-creator and mcp-builder, OpenAI has skill-creator and skill-installer, SpeakEasy has speakeasy-context, and Microsoft has deep-wiki. These aren't skills for doing domain work, they're skills for building the infrastructure around agent capabilities. The fact that multiple vendors independently arrived at the need for skills-about-skills suggests the ecosystem is entering a second-order phase where the tooling for skill development is becoming as important as the skills themselves. When you can ask an agent to help you write a better skill, or to generate documentation that other agents can consume, the leverage compounds in interesting ways, while also spanning core and operational work.

Core Conventions
Every repo in my survey independently converged on the same core structure: a kebab-case directory name, a SKILL.md with YAML frontmatter, a name field, and a description field that drives triggering. No one was required to do it this way, the spec exists, but it doesn't mandate much beyond the SKILL.md filename. The convergence happened because the design is reflected in OpenAPI and other spects. The description field as the primary triggering mechanism means agents can make routing decisions without loading the full skill, which keeps context consumption flat as the number of available skills grows. The progressive disclosure model, metadata always available, body on trigger, references on demand, falls out of this naturally, which makes the description pretty damn important.
The other convention that appears everywhere without being mandated is the references/ subdirectory pattern. Whether it's Cloudflare's 341 files or Postman's two, every repo with complex skills separates deep documentation from the main SKILL.md and links to it explicitly. This reflects what actually got me interested in MCP. The reasoning is the same everywhere, with skills that try to include everything in one file become too heavy to load efficiently, and they mix triggering information with reference material that only matters in specific workflows. Keeping references separate also makes them easier to update independently of the skill logic. The pattern has emerged as a community norm before anyone has written it into the specification, which speaks to the looseness of the spec, but also the potential feedback loop I want to capture from different providers.

Key Differentiators
The single biggest differentiator across these Agent SKills repos is the investment in evaluation and testing. Most vendors ship skills and assume they work. SpeakEasy has a Python eval harness with YAML test suites. Microsoft has 1,158 test scenarios and acceptance criteria files. Vercel has a TypeScript build pipeline with a dedicated extract-tests.ts module. These speak to difference between skills that reliably trigger and guide correctly versus skills that work most of the time but fail in ways that could be hard to diagnose. As the number of available skills grows and they start composing with each other, this quality gap will matter more, not less, but I am concerned with how this plays out equally across providers.
The second major differentiator is MCP integration depth, which balances with last week's newsletter. Cloudflare runs five remote MCP servers. Postman's entire skill architecture is organized around 110+ Postman MCP tools. Microsoft references MCP server configuration in .vscode/mcp.json. OpenAI declares MCP dependencies per-skill in agents/openai.yaml. MCP is becoming the integration layer between skills and external services, and the vendors who are investing in it seem to be the ones whose skills have the most operational surface area. Pure-documentation skills seem to be getting away without MCP; skills that actually do things at runtime increasingly can't, which leaves me questioning some of the documentation and knowledgebase learnings I found with MCP.

Manifest Type
The way a repo declares what skills it contains reveals a lot about its distribution philosophy for me. Anthropic uses marketplace.json as a structured manifest, and groups skills into thematic bundles. Microsoft uses marketplace.json at the top level but also maintains a symlink tree that makes skills browsable by language and category without needing to read the manifest. Cloudflare adds .cursor-plugin/plugin.json alongside .claude-plugin/plugin.json for cross-platform coverage. Postman ships with no manifest at all, relying on the README as the only index.
The manifest format question is where the spec is where I am really interested in this moment. Right now, different vendors are making incompatible choices about how to group, version, and describe skills in their manifests, which means tools that want to support multiple repos need to handle multiple manifest structures. The marketplace.json format from Anthropic is the closest thing to a common baseline, but it's not universally adopted. Microsoft's plugin-level metadata with keywords and categories, and OpenAI's tier-based directory structure all represent different opinions about how discovery could work. A unified and standardized manifest schema would help the ecosystem here, and it's the kind of thing that usually gets standardized once enough tooling has to support all the variants, and I feel is an opportunity for Naftiko.

MCP Integrations
As mentioned earlier, Model Context Protocol is showing up everywhere in the skills landscape, but it's showing up differently depending on the vendor's relationship to tooling versus documentation. Cloudflare defines five remote MCP servers in .mcp.json, not just for documentation access but for live operations against the platform: deploying builds, querying observability data, accessing bindings. Postman's entire operational skill layer is built on MCP tools, with 110+ available operations cataloged in a reference file. These aren't supplementary integrations, MCP is the primary execution mechanism for these skills.
The more interesting pattern is how vendors are starting to ship MCP server implementations alongside their skills. . OpenAI's agents/openai.yaml declares MCP server dependencies with transport types and URLs, turning skill installation into a dependency resolution step. Microsoft references .vscode/mcp.json for server configuration. What this adds up to is a skills-plus-servers bundle that gives agents both the knowledge layer (what to do) and the execution layer (how to do it at runtime). The separation between documentation skills and operational skills is what I am interested in getting at when it comes to context engineering.

Multi-platform
Skill portability across agent platforms is a stated goal of the agentskills.io spec, and the repos vary widely in how seriously they take it. SpeakEasy lists 15+ supported platforms in their README. Cloudflare ships both .claude-plugin/ and .cursor-plugin/ manifests. Most other repos are implicitly Claude Code-first, with other platform support added in seemingly arbitrary ways, based upon their experiences. Some providers look like they did their research to understand what they'd do, where others did it based upon how they see the world, which is something I think will mature as things stabilize in the AI hustle.
The multi-platform challenge isn't just about manifest formats, it's about execution environments. Vercel's deploy skill explicitly branches based on whether it's running in Claude Code (local terminal with full filesystem access), claude.ai (sandboxed), or Codex (custom sandbox with different escalation paths). Bash script conventions matter differently depending on the sandbox. File paths resolve differently depending on the environment. Skills that look portable because they use the same SKILL.md format can still behave inconsistently across platforms when they hit actual execution. The vendors who are thinking about this carefully are documenting environment-specific behavior inside their skills rather than assuming portability happens automatically once the SKILL.md is installed. Something I'd like to see further standardized.

Decision Frameworks
Several repos have independently arrived at the decision framework pattern, structured tables inside a skill that map user intent or problem type to a specific workflow or response. Cloudflare's umbrella skill uses decision trees to route users to the right product given a stated need. SpeakEasy's diagnostic skills map issue types (naming, structural, design) to fix strategies (overlay, ask user, produce strategy document). Postman's skills include routing tables that map user intent phrases to named workflows. These aren't just documentation, they're structured guidance that agents can follow deterministically rather than having to reason from first principles every time, which speaks to what I'm hearing from some design partners about how they need to support different skill levels.
What makes decision frameworks valuable for skills is that they encode domain expertise in a form that transfers reliably. An agent that has access to a well-structured decision table for a domain it doesn't know well can navigate to a correct response more reliably than one reasoning from general knowledge. The pattern also tends to make the skill's logic auditable, you can read the decision table and understand exactly what the skill will recommend under different conditions. SpeakEasy's anti-pattern sections work the same way in reverse: by listing what the skill should never do, they constrain the solution space and reduce the chance that an agent drifts into a wrong answer that technically satisfies the surface-level request while violating domain best practices. This is some of the strategic opporunity I have been concerned is missing, so it gives me hope that it is present.

Marketplace & Distribution
The distribution question for skills is still being worked out, and the repos show it. Most use some form of npx skills add {org}/{repo} as the install command, which is clean and consistent. But what you get when you install varies significantly, a flat list of skills, a curated bundle, a plugin namespace, or a tiered hierarchy. Anthropic groups skills into document-skills, example-skills, and claude-api bundles. OpenAI tiers skills into .system/ (always on), .curated/ (installable), and .experimental/ (in progress). Microsoft organizes by language and category in a symlink tree. These are different distribution philosophies, not just different folder structures.
The versioning question is also mostly unresolved. A few repos include version fields in frontmatter (metadata.version: "2.0.1" in Postman's case, version: "3.0.0" in Vercel's) but there's no ecosystem-level answer for how skill versions are discovered, pinned, or updated. For skills that are purely documentation, this matters less, they're not dependencies in the traditional sense. But skills that include scripts, reference specific MCP server versions, or depend on particular tool capabilities are effectively versioned artifacts, and treating them as static markdown files will create problems as the ecosystem matures. The marketplaces being built around the skills format need to take versioning seriously before it catches up with teams who haven't been considering the future.

OpenAPI & MCP & Agent Skills
The intersection of OpenAPI, MCP, and agent skills is where things get genuinely interesting for the API industry. SpeakEasy's entire skills repo is organized around the OpenAPI lifecycle, writing specs, extracting them from code, managing overlays, generating SDKs, creating Terraform providers, building MCP servers from specs. These are the operations that used to require manual developer effort at every step, and SpeakEasy is encoding the best practices for each into skills that agents can execute. The writing-openapi-specs skill alone has six reference files covering schemas, parameters, request bodies, responses, security, and components. That's not supplementary documentation, it's the knowledge base for an agent that can write production-quality OpenAPI.
I am reluctant to admit that Postman's postman-api-readiness skill points to where this could be heading at the API design level. The question shifts from "is this API well-designed?" to "can an AI agent reliably use this API?", and those are different questions with different answers. The eight pillars (Metadata, Errors, Introspection, Naming, Predictability, Documentation, Performance, Discoverability) and 48 specific checks aren't arbitrary design preferences. They're the criteria that determine whether an agent will succeed or fail when trying to use an API autonomously, but honestly I feel like we shouldn't just assess and fail on these problems, we should be fixing them before they get expressed as MCP or Agent Skills, and something we are addressing with Naftiko.

Opportunity & Challenges
I am three days late doing this newsletter because I wanted to go through the spec and each of these top tier providers I have showcased in the Naftiko newsletter. I spent time this week talking through my research, which I used to produce this newsletter. The manifests, distribution, marketplace, taxonomy, and relationship to OpenaPI and MCP requires more digging. I like the looseness of the spec, and I like all the ways people are trying to balance the determinism and non-determinism as part of their approaches to Agent Skills. I think this is critical for the future of Agent Skills.
I have Agent Skills for API Evangelist, and I am developing more for Naftiko. I want to lean on API service providers to publish a repository with their Agent Skills like SpeakEasy and Postma have. I'd like to invest in more discussion around the maturity of Agent Skills, and think more deeply around how they mature and are used across platforms. These skills all seem very selfish and self-serving, but so are most OpenAPIs I gather. I am interested in some sort of rating system for skills, and how we are going to govern them as well as assess the quality. I am most interested in discovery of Agent Skills using new federated and ephemeral approaches instead of just catalogs and registries.
With research on MCP and Agent Skills initiated, I will merge with my research acros other specifications. I am conducting more conversations with SpeakEasy about the relationship between OpenAPI and Agent Skills. I see Agent Skills as a derivative of OpenAPI, but also APIs.json. I see MCP and Agent Skills as an output alongside REST APIs of a Naftiko Capability, but I think it will be the business and domain alignment of Agent Skills that matter the most, producing the context we need to have the impact we desire. The domain separate present in the research was the most interesting for me when it comes to business alignment, but I also think the plain language markdown and YAML nature of skills provides the greatest opportunity to produce stories that bridge product and engineering.
"Only those who have patience to do simple things perfectly ever acquire the skill to do difficult things easily." — James J. Corbett