WP Webhooks / Blog / AI integration
Article · AI integration

Build an AI Agent in a WordPress Plugin: Architecture

Inside an in-admin AI agent for WordPress: plan-first JSON envelopes, typed ability steps, confirmation gates, undo stacks, and trace logging.

9 min 2026-06-26
#ai#architecture#wordpress

TL;DR: An in-admin AI agent can build WordPress integrations safely if it works plan-first: the model proposes an ordered plan of typed steps as JSON, and the plugin executes them locally under your permission model.

  • A strict JSON envelope (message + clarifying questions + plan) replaces native tool calling — so the loop works over any transport, including the WordPress 7.0 AI Client.
  • Safety is structural: new webhooks are created disabled, going live or deleting requires explicit confirmation, and every mutation snapshots prior state onto an undo stack.
  • Steps execute one at a time; a failure halts the run without touching later steps.
  • A dev trace (model, latency, full prompt, parse status) is the only way to debug an agent you cannot see.

/ Why

Why put an AI agent inside wp-admin?

Because the agent can touch what an external chatbot cannot: the site's real state. An in-admin agent building a webhook integration reads the actual trigger catalog, inspects captured payloads from hooks that already fired, sees the webhooks that already exist, and probes destination endpoints from the server — so it works from real data instead of guesses. This is the design behind Build with AI, the agent coming in the next major release of the Webhook Actions plugin: describe an integration in plain language, and the agent proposes, builds, and tests the webhooks, field mappings, conditions, and chains.

The catch is obvious: a language model with write access to a production site is a liability unless the architecture makes unsafe actions structurally hard. That is what plan-first execution is for.

/ The loop

What is a plan-first agent loop?

A loop where the model never mutates anything during the conversation. Each turn, the model receives the transcript plus a system prompt describing the available typed operations, and replies with a proposal: a short assistant message, optional clarifying questions, and an optional ordered plan of steps. The plugin normalizes the plan, shows it to the user for review and editing, and only then executes it — step by step, locally, through its own code paths.

The system prompt makes the contract explicit: the agent must never claim to have changed anything itself; it proposes, the plugin disposes. It is also told to prefer action over interrogation — when the goal is clear, propose the full plan with sensible defaults, leave genuinely unknown fields blank, and ask only for those.

Plan-first AI agent loop inside a WordPress pluginThe user describes a goal in chat. The model replies with a strict JSON envelope containing an assistant message, optional clarifying questions, and an ordered plan of typed ability steps. The user reviews and can edit the plan. The plugin then executes steps one at a time: destructive steps pause for explicit confirmation, every mutation snapshots the prior state onto an undo stack, and a failed step halts the run without touching later steps.

yes

no

next step

plan done

step fails

user describes goal in chat

LLM returns JSON envelope
message + questions + plan

user reviews / edits plan

plugin executes step N

destructive step?
enable / delete / edit live

pause for explicit confirm

run ability locally

snapshot pre-state
onto undo stack

integration live (revertible)

halt run
later steps untouched

FIG 01 — The plan-first agent loop

/ Protocol

How does a JSON envelope replace tool calling?

The model is instructed to reply with a single JSON object and nothing else. That envelope is the whole protocol — which means the loop runs identically over the Anthropic, OpenAI, and Google APIs and over the WordPress 7.0 AI Client, which has no native tool-calling at all.

JSON — the envelope the model must return

{
  "assistant_message": "I'll send new WooCommerce orders to your n8n workflow.",
  "clarifying_questions": ["What is the n8n webhook URL?"],
  "plan": [
    {
      "id": "step_1",
      "ability": "create_webhook",
      "summary": "Create the order webhook (disabled)",
      "input": { "name": "Orders to n8n", "endpoint_url": "" }
    },
    {
      "id": "step_2",
      "ability": "test_dispatch",
      "summary": "Send a test delivery",
      "input": { "webhook_id": "{{step_1.id}}" }
    }
  ]
}

Three details make this production-grade rather than a demo. Steps are typed: each names one ability from a fixed catalog, and its input must match that ability's JSON Schema — the model cannot invent operations. Steps can reference earlier results: {{step_1.id}} is substituted with the real id at run time, so the model can chain a build without knowing database ids. And parsing is defensive: strip code fences, fall back to the outermost brace span, and if the reply still is not valid JSON, treat the whole text as a plain assistant message with no plan — a malformed reply degrades to conversation, never to a broken mutation.

/ Safety

How does step-by-step execution stay safe?

Safety lives in the executor, not in the prompt. The prompt asks the model to behave; the executor makes misbehaviour inert:

  1. New webhooks are created disabled. The agent can build a complete integration, but nothing fires until a human flips it live.
  2. Destructive steps carry confirmation metadata. Enabling, deleting, or editing a live webhook always pauses for explicit confirmation; an endpoint probe pauses only when its HTTP method is unsafe.
  3. Steps run one at a time. The frontend advances the plan step by step; missing required input, an unmet prerequisite, or a needed confirmation pauses the run at that exact step.
  4. Failure halts, never cascades. A failed step stops the run with the error attached; later steps are never attempted against a half-built state.

Everything also passes the same capability checks as the admin UI — the agent holds no special powers, and credential values stay in a write-only vault the agent can reference but never read.

The model never calls tools natively — it proposes typed steps, and the plugin executes them locally. The site stays in control. — the core contract

/ Undo

How do undo and revert work?

Before the executor mutates anything, it snapshots the object's prior state onto the step record. Undo then walks the applied steps backwards: the last still-applied revertible step is found, its pre-state restored — or the object it created deleted — and the step marked reverted. Repeated calls walk further back, giving you a real undo stack rather than a single "oops" button.

One subtle but important touch: each undo is recorded into the conversation transcript. The model sees what was undone on its next turn, so it does not confidently reference a webhook that no longer exists.

/ Debugging

How do you debug an agent you cannot see?

With a trace, or not at all. Every model call records the provider and model that answered, latency, temperature, the full system prompt, the exact message array sent, the raw response, whether the JSON envelope parsed, and how many plan steps it contained. When a user reports "the AI did something weird", that trace is the difference between a fix and a shrug: a parse failure, a provider quota error, and a genuinely bad plan all look identical from the chat UI but completely different in the trace.

The trace matters double when your transport can silently switch models. With cross-provider fallback in play, "which model actually produced this plan?" must be answerable per request, not per configuration.

/ Toolset

What does the agent actually control?

A fixed registry of 17 typed abilities — the same toolset the plugin publishes to the WordPress Abilities API for external clients like Claude Code over MCP. One catalog of operations, one permission story, three consumers: the in-admin agent, the REST surface, and MCP tooling (the Abilities API article covers that side).

The system prompt also injects a compact catalog of the webhooks that already exist on the site, flagged with their numeric ids — so when the user says "rename the order webhook", the agent proposes an update to webhook #12 instead of creating a duplicate. Grounding the model in current state is as much a safety feature as the confirmation gates.

ConcernChatbot that "just does it"Plan-first agent (Build with AI)
MutationsModel output parsed and applied immediatelyTyped plan, reviewed and edited, stepped through
Destructive actionsOne hallucination from deleting live configCreated disabled + explicit confirm to go live or delete
Failure mid-taskHalf-applied state, no rollbackRun halts; undo stack restores pre-state snapshots
AuditabilityA chat logPer-step activity log + full model trace

This architecture powers Build with AI, coming in the next major release of the Webhook Actions plugin: describe the integration you want, review the plan, and it creates queued, retried, fully logged webhooks — disabled until you say go.

/Footnotes
¹ The Webhook Actions plugin on WordPress.org: wordpress.org/plugins/flowsystems-webhook-actions.
FAQ

Common questions always ask.

Don't see yours? Open an issue on GitHub or check the full reference in the API docs.

What is a plan-first AI agent? +
An agent that never mutates anything during the conversation. The model replies with a proposal — an assistant message, optional clarifying questions, and an ordered plan of typed steps — and the application executes the plan locally, step by step, after the user reviews and optionally edits it. The model proposes; the plugin disposes.
How can an AI agent work without native tool calling? +
With a structured-output protocol: the model is instructed to return a single strict JSON envelope containing the message, questions, and plan. Because the protocol lives in the prompt and parser rather than the transport, the same loop runs over the Anthropic, OpenAI, and Google APIs and over the WordPress 7.0 AI Client, which has no tool-calling.
How do you stop an AI agent from breaking a production site? +
Structurally, not with prompt engineering alone. New webhooks are created disabled, destructive steps (enable, delete, edit a live webhook) pause for explicit confirmation, steps execute one at a time and halt on failure, every invocation passes the same capability checks as the admin UI, and every mutation snapshots prior state onto an undo stack.
How does undo work in an AI agent? +
Before each mutation, the executor snapshots the object's prior state onto the step record. Undo walks the applied steps backwards, restoring each pre-state or deleting created objects, and marks the step reverted. Each undo is also written into the conversation transcript so the model knows on its next turn.
Can external AI tools drive the same agent toolset? +
Yes — by design: the agent's abilities are also published to the WordPress Abilities API under the plugin's namespace, so external MCP clients like Claude Code and Cursor can discover and invoke the identical operations over REST or the MCP Adapter — one toolset, one permission story, three consumers.
Ready

Stop losing webhooks.
Start logging them.

$ wp plugin install flowsystems-webhook-actions --activate