/ Article — Architecture

Async Webhooks in WordPress: Non-Blocking Architecture for Reliable Delivery

Synchronous webhook calls block PHP execution, expose users to third-party latency, and silently drop data when endpoints are unavailable. This article explains how to move webhook dispatch out of the request cycle entirely — using a queue, a cron worker, exponential backoff, and structured logging.

/ The Problem

What goes wrong with synchronous webhooks

PHP is single-threaded and synchronous. When WordPress fires an action hook — say, woocommerce_order_status_completed — every listener attached to that hook runs inline, before the response is sent to the browser. If one of those listeners makes an outbound HTTP request to a webhook endpoint, the entire request stalls until that endpoint responds.

That's fine in a best-case scenario where the endpoint is fast and always available. In practice, it isn't. External APIs go down. CDNs throttle. A Zapier or n8n webhook URL can take three to five seconds to acknowledge. A slow network handshake multiplied across hundreds of WooCommerce orders per hour means checkout pages routinely take five to ten seconds longer than they should — or fail entirely if the endpoint returns a 5xx or times out.

The failure mode is worse than slowness: if the request times out, the data is lost. There is no retry, no log entry, no signal that delivery failed. The order completed from WooCommerce's perspective but your CRM, ERP, or automation platform never received the event.

Non-blocking webhook dispatch solves all three problems. The user-facing request is never held up by an outbound HTTP call. Failed deliveries are retried automatically. Every attempt — success or failure — is observable.

/ Comparison

Synchronous vs Asynchronous webhooks

Aspect Synchronous Asynchronous
Execution model Inline, blocks PHP thread Background worker, non-blocking
Timeout risk Times out → data lost silently Worker retries on failure
Retry on failure None — one attempt only Configurable retry schedule
Impact on user request Adds endpoint latency to page load Zero impact on response time
Observability No log; silent failures Per-attempt status, code, timestamp
Implementation complexity Low — one wp_remote_post call Requires queue table + cron worker

The trade-off is clear: synchronous delivery is simpler to write but fragile in production. Asynchronous dispatch requires more upfront infrastructure but eliminates the most common failure modes — and is the correct choice for anything beyond development or very low-volume sites.

/ Architecture

How the async queue works

  User Request
       │
       ▼
  WordPress Action Hook (e.g. woocommerce_order_status_completed)
       │
       ▼
  Listener: writes job to queue table
  { endpoint, payload, attempt=0, status=pending }
       │
       ▼
  Response sent to user  ◄── request ends here
       │
       (background)
       ▼
  Cron Worker (WP-Cron or system cron)
       │
       ├── Fetch pending jobs (batch)
       │
       ▼
  wp_remote_post( $endpoint, $payload )
       │
       ├─ 2xx → mark job complete, log success
       │
       └─ failure → increment attempt, schedule retry
              │
              └─ attempt >= max → move to dead-letter, alert

Queue table. Jobs are stored in a custom database table, not in the options or postmeta tables. This gives you efficient queries by status, fast batch fetching, and clean cleanup — none of which are practical with the WordPress options API at scale.

Worker. A WP-Cron event (or a system cron hit to a REST endpoint) runs on a regular interval — every minute is typical — and processes a batch of pending jobs. System cron is preferred for reliability: WP-Cron only runs when someone visits the site, which is not guaranteed. For a full guide on setting up a real cron job for WordPress — including the crontab entry, WP-CLI alternative, and Action Scheduler — see Cron Job for WordPress: WP-Cron Limits and Real Fixes.

Retry scheduling. When a job fails, the worker does not immediately requeue it. Instead, it calculates the next attempt time using exponential backoff and sets a scheduled_at timestamp. The job becomes visible to the next worker run only after that time elapses.

Dead-letter. After a configurable maximum number of attempts, the job is moved to a permanent failure state rather than retried indefinitely. This prevents a single bad endpoint from consuming queue capacity forever.

/ Implementation

Enqueueing webhooks on WordPress actions

The listener is registered with add_action. When the action fires, the listener builds the payload and writes a row to the queue table. Nothing is dispatched at this point — that happens later, in the background.

queue-enqueue.php — listener and enqueue logic
/** * Register the listener on plugin init. * Uses priority 20 to run after WooCommerce's own handlers. */ add_action( 'init', function() { add_action( 'woocommerce_order_status_completed', 'my_enqueue_order_webhook', 20, 1 ); } ); function my_enqueue_order_webhook( $order_id ) { $order = wc_get_order( $order_id ); if ( ! $order ) { return; } // Build a structured payload — keep it consistent across retries. $payload = [ 'hook' => 'woocommerce_order_status_completed', 'order_id' => $order_id, 'total' => $order->get_total(), 'currency' => $order->get_currency(), 'email' => $order->get_billing_email(), 'timestamp' => time(), 'site_url' => get_site_url(), ]; // Write to the queue table. Nothing is sent yet. my_queue_insert( 'https://your-endpoint.example.com/webhook', $payload ); } function my_queue_insert( $endpoint, $payload ) { global $wpdb; $wpdb->insert( $wpdb->prefix . 'webhook_queue', [ 'endpoint' => $endpoint, 'payload' => wp_json_encode( $payload ), 'status' => 'pending', 'attempt' => 0, 'created_at' => current_time( 'mysql', true ), 'scheduled_at' => current_time( 'mysql', true ), ] ); }

The payload is serialised as JSON at enqueue time, not at dispatch time. This ensures the data captured reflects the state of the order at the moment the action fired — even if the order is modified before the worker runs. The scheduled_at column controls when the worker first picks up the job; on initial insert it's set to now, so the job is eligible immediately.

/ Retry Logic

Exponential backoff & retry scheduling

Simple retry — "try again immediately on failure" — is usually the wrong approach. It floods a recovering endpoint with requests, potentially making the outage worse. Exponential backoff spaces retries further apart on each successive failure, giving the endpoint time to recover.

The formula is straightforward:

delay = base_delay × 2attempt

With a base delay of 60 seconds (1 minute), the retry schedule looks like this:

Attempt 1
→ 1 min →
Attempt 2
→ 2 min →
Attempt 3
→ 4 min →
Attempt 4
→ 8 min →
Attempt 5
→ 16 min →
Dead-letter
queue-worker.php — dispatch with retry scheduling
/** * Worker function — called by WP-Cron every minute. * Fetches a batch of due pending jobs and attempts delivery. */ function my_webhook_worker() { global $wpdb; $table = $wpdb->prefix . 'webhook_queue'; $max_attempts = 5; $base_delay = 60; // seconds // Fetch up to 25 jobs that are due now. $jobs = $wpdb->get_results( $wpdb->prepare( "SELECT * FROM {$table} WHERE status = 'pending' AND scheduled_at <= %s ORDER BY scheduled_at ASC LIMIT 25", current_time( 'mysql', true ) ) ); foreach ( $jobs as $job ) { $response = wp_remote_post( $job->endpoint, [ 'headers' => [ 'Content-Type' => 'application/json' ], 'body' => $job->payload, 'timeout' => 10, ] ); $attempt = (int) $job->attempt + 1; $status_code = wp_remote_retrieve_response_code( $response ); $success = ! is_wp_error( $response ) && $status_code >= 200 && $status_code < 300; if ( $success ) { my_queue_update( $job->id, 'complete', $attempt ); } elseif ( $attempt >= $max_attempts ) { // Exhausted retries — move to dead-letter. my_queue_update( $job->id, 'failed', $attempt ); } else { // Schedule the next retry using exponential backoff. $delay = $base_delay * pow( 2, $attempt - 1 ); $next_attempt = gmdate( 'Y-m-d H:i:s', time() + $delay ); my_queue_reschedule( $job->id, $attempt, $next_attempt ); } // Log every attempt regardless of outcome. my_queue_log( $job->id, $attempt, $status_code, $success ); } } add_action( 'my_webhook_cron', 'my_webhook_worker' );

A few details worth noting. The timeout on wp_remote_post is set to 10 seconds — longer than the default 5-second WordPress HTTP timeout, which is often insufficient for automation platform webhooks. Increase this if your endpoint is consistently slower. Add jitter (a small random offset applied to each retry delay) if you're dispatching high volumes — it prevents multiple failed jobs from retrying at exactly the same second and thundering-herding the endpoint.

/ Failure Handling

Dead-letter queues & permanent failure

Not every failure is transient. An endpoint that has been decommissioned, a URL that now returns 404, or a payload that the receiver rejects with 400 — these won't be fixed by retrying. After a configured maximum number of attempts, the job should be moved to a dead-letter state rather than left in the retry queue indefinitely.

Mark jobs as permanently failed after five to ten attempts, depending on the criticality of the data. Store the final HTTP status code and response body alongside the job record so the failure reason is inspectable without needing to reproduce the error.

Distinguishing failure types

Use is_wp_error() to catch network-level failures — DNS resolution errors, connection refused, SSL handshake failures. These are separate from HTTP-level failures (4xx, 5xx) where the endpoint was reached but rejected the request.

Treat 4xx responses differently from 5xx. A 400 Bad Request or 422 Unprocessable Entity is unlikely to resolve itself — the payload is malformed from the endpoint's perspective. Retrying these wastes attempts. Consider marking 4xx-triggered jobs as failed immediately (or after one confirmation attempt) rather than exhausting the full retry schedule.

A 503 Service Unavailable or a network timeout, by contrast, is exactly what exponential backoff is designed for.

Alerting on persistent failures

When a job transitions to the failed state, trigger an alert. At minimum, write to the WordPress error log via error_log(). For production systems, fire a do_action hook that can be wired to an email notification, a Slack message, or an internal monitoring endpoint. Unmonitored dead-letter queues that silently accumulate are as bad as no queue at all. Dead-letter events also need a recovery path: once the underlying problem is fixed, they should be replayable on demand — see the full retry and replay architecture for how that works.

/ Observability

Logging & queue monitoring

A queue without observability is a black box. At minimum, log the following fields for every dispatch attempt:

Per-attempt log record
{ "job_id": 1042, "endpoint": "https://hooks.example.com/order-complete", "payload_hash": "sha256:a3f9...", // hash, not raw payload — avoids PII in logs "attempt": 2, "status_code": 503, "wp_error": null, "duration_ms": 4821, "timestamp": "2026-02-19T14:03:22Z" }

Store the payload hash rather than the raw payload in the log table to avoid accidentally persisting sensitive order or user data in a secondary location. The full payload is already stored in the queue table against the job record.

Queue depth monitoring

Track pending and failed job counts as operational metrics. A growing pending queue that isn't draining indicates the cron worker isn't running. A growing failed queue indicates a systematic endpoint problem. Both are actionable signals.

Expose a simple count query via a WP-Admin page or a WP-Cron-adjacent status panel. For automated monitoring, add a REST endpoint that returns queue stats as JSON — it takes ten minutes to write and can be polled by any uptime tool.

WP-Cron reliability

WP-Cron is not a real cron. It fires on page load, which means on a low-traffic site, your worker might not run for minutes or hours. For production, configure a real system cron entry that hits the WordPress cron URL directly (or a dedicated REST endpoint) on a fixed schedule:

System cron — run WP-Cron every minute
# crontab entry — runs every minute regardless of site traffic * * * * * curl -s https://your-site.com/wp-cron.php?doing_wp_cron > /dev/null 2>&1
/ References

Official documentation

All implementation patterns described here use WordPress-native APIs. These are the primary references:

/ Production Alternative

If you'd rather not maintain this yourself

Flow Systems Webhook Actions is an open-source async webhook plugin for WordPress that implements this architecture out of the box — including queue processing, retry logic with exponential backoff, and structured delivery logging. Delivery logs, retry triggers, and queue status are also accessible programmatically via REST API — the REST API article covers monitoring, bulk retry, and automated recovery patterns in detail, with the full endpoint reference here. If your team prefers configuration over maintaining custom queue infrastructure, you can explore the full details on the async webhook plugin for WordPress. The code is publicly available on GitHub and distributed via WordPress.org.

/ FAQ

Common questions

wp_remote_post is a synchronous HTTP call: PHP waits for the remote server to respond before continuing execution. Async webhook dispatch wraps that call inside a background job — the request is written to a queue during the user-facing action and dispatched later by a cron worker.

The user's response time is unaffected regardless of endpoint latency or availability. The queue also enables retry on failure, which a bare wp_remote_post call cannot do.
Five attempts covers most production scenarios — transient network failures and short outages typically resolve within the first few retries. Combine this with exponential backoff (1 min → 2 min → 4 min → 8 min → 16 min) and a dead-letter mechanism to capture permanently failed jobs.

High-volume or compliance-sensitive systems may warrant more attempts with longer maximum delays. Avoid retrying indefinitely — it obscures systematic failures that need human attention.
Yes. The minimal implementation requires: a database table for the queue (created on plugin activation via dbDelta), an add_action listener that writes to that table, and a WP-Cron event that reads from the table and dispatches via wp_remote_post.

The code itself is manageable, but ongoing maintenance — schema migrations, retry logic, logging, queue depth monitoring — adds complexity over time. A custom implementation is a good choice if you need tight control over the queue behavior; a plugin is better if reliability is more important than customisation.
The most common causes, in order:

Endpoint timeout — the default wp_remote_post timeout is 5 seconds. Many automation platform webhooks (n8n, Zapier cold starts) take longer. Increase the timeout to 10–15 seconds.

5xx errors from the receiving end — transient server errors; handled by retry logic.

SSL certificate issues — the destination server has an expired or misconfigured certificate.

WP-Cron not running — on low-traffic sites, WP-Cron may not fire for minutes or hours, causing jobs to pile up unprocessed rather than fail outright. Switch to system cron.