What is the cleanest way to do conditional branching in n8n?

Use the Switch node for 3+ branches and the IF node for binary splits. Avoid chaining IF nodes — Switch with explicit rules is more readable, easier to diff in Git, and faster on self-host because n8n short-circuits evaluation. Put a NoOp at each branch terminus to make the visual graph readable.

How do I run n8n steps in parallel without race conditions?

Use SplitInBatches with batchSize=1 to fan out, then a Merge node in "Combine by Position" or "Multiplex" mode to fan in. For true parallelism on self-host, enable queue mode (EXECUTIONS_MODE=queue) with multiple worker replicas — the regular mode is single-threaded per workflow. Idempotency keys belong on every side effect, not on the fan-out boundary.

How does n8n handle retries with exponential backoff?

Every node has a Retry On Fail setting with max tries and wait-between-tries in milliseconds. For true exponential backoff, wrap the call in a sub-workflow and use a Code node to compute the wait: `await new Promise(r => setTimeout(r, Math.min(2 ** $json.attempt * 1000, 30000)))`. n8n does not do backoff natively, but the pattern is five lines.

What is the right way to make an n8n workflow idempotent?

Hash the input payload to a stable key, check Redis or Postgres for the key before doing the side effect, write the key after success with a TTL. The Redis node and Postgres node are first-class; the whole pattern is 3 nodes. Idempotency belongs inside the workflow, not at the trigger — webhooks can fire twice.

When should I split logic into sub-workflows?

Three triggers: (1) the same logic is called from 2+ parent workflows, (2) the parent workflow exceeds ~30 nodes and becomes hard to read, (3) you want to retry just one stage without re-running the whole pipeline. Sub-workflows are typed function calls — treat them like you would refactor a 200-line function.

How do error workflows actually work in n8n?

In workflow settings, point "Error Workflow" at a dedicated error-handling workflow. When any node fails (after retries), n8n invokes that workflow with the failed execution payload — workflow id, node name, error message, run data. The error workflow typically alerts, logs to your observability stack, and decides whether to manually retry the parent. One per project, not one per workflow.

n8n complex workflow patterns (2026): branching, fan-out, retries, and error handling

1. Conditional branching done right

The default reflex is to chain IF nodes. Don't. For anything past two branches, use a Switch node — it's readable in the canvas, diffs cleanly in Git when workflows are exported to JSON, and short-circuits evaluation on self-host.

Pattern: route customer events by plan tier.

Trigger (webhook: customer.event)
  ↓
Switch node — rules:
  rule 1: {{ $json.plan === 'enterprise' }}  → Enterprise handler
  rule 2: {{ $json.plan === 'pro' }}         → Pro handler
  rule 3: {{ $json.plan === 'free' }}        → Free handler
  fallback: → Slack alert (unknown plan)

Each branch ends in a NoOp node before merging back — it makes the visual graph readable and gives you a stable handle to insert metrics later. The fallback rule is non-negotiable: an unhandled branch is a silent bug factory.

2. Parallel fan-out and fan-in (the API-batch pattern)

Calling 200 external APIs sequentially takes 200× one call. n8n's default execution is single-threaded per workflow, but you get real parallelism two ways:

Within a workflow: SplitInBatches with batchSize=1 + branches that don't depend on each other → Merge in "Combine by Position" mode.
Across workers: Set EXECUTIONS_MODE=queue, deploy 3-10 worker replicas behind Redis, and each sub-workflow execution lands on a free worker.

Example: enrich 500 leads from an internal API, then write to Postgres.

Postgres (SELECT 500 leads)
  ↓
SplitInBatches (batchSize=10)
  ↓
HTTP Request (enrich) — retry on fail: 3 tries, wait 2000ms
  ↓
Postgres (UPSERT enriched row)
  ↓
[loop back to SplitInBatches until done]
  ↓
Merge (Combine by Position)
  ↓
Slack (summary: "enriched 500/500 in {{ $execution.duration }}ms")

The reason for batchSize=10 instead of 1 is API rate limits — most third-party APIs cap you somewhere between 10 and 100 requests per second. Tune to your weakest downstream. For the cost implications of running this kind of volume, see n8n vs Zapier self-hosting cost.

3. Retries with exponential backoff

Every HTTP node has built-in retry: max tries + wait-between-tries. That handles 80% of transient failures. For the other 20% — where you want exponential backoff, jitter, or a circuit breaker — wrap the call in a sub-workflow:

// Inside the retry sub-workflow, before the HTTP node:
const attempt = $json.attempt ?? 0;
const baseMs = 1000;
const maxMs = 30000;
const jitter = Math.random() * 500;
const waitMs = Math.min(2 ** attempt * baseMs + jitter, maxMs);

await new Promise((r) => setTimeout(r, waitMs));
return [{ json: { ...$json, attempt: attempt + 1, waitMs } }];

Then the parent calls the sub-workflow with retryOnFail: true, maxTries: 6 and you get 1s, 2s, 4s, 8s, 16s, 30s backoff with jitter. The Code node above is 6 lines; the equivalent in Zapier is "buy a higher tier or write it externally".

4. Idempotency: stop processing the same webhook twice

Webhooks fire twice. APIs return success on the retry. Cron triggers overlap when a previous run is still going. Every workflow with a side effect (charge a card, send an email, write to a CRM) needs an idempotency layer. The three-node pattern:

Webhook trigger
  ↓
Code node — compute idempotency key:
  const crypto = require('crypto');
  const key = crypto.createHash('sha256')
    .update(JSON.stringify({ id: $json.id, event: $json.event }))
    .digest('hex');
  return [{ json: { ...$json, idempotencyKey: key } }];
  ↓
Redis (GET {{ $json.idempotencyKey }})
  ↓
IF (value exists?)
  true  → NoOp (already processed, exit)
  false → [side effect] → Redis (SET key with TTL=24h)

Use Postgres instead of Redis if you don't already run Redis — the pattern is identical with SELECT / INSERT ON CONFLICT DO NOTHING. The TTL is whichever is longer: your retry window or your support ticket SLA.

5. Sub-workflows: refactor at 30 nodes or 2 callers

Sub-workflows in n8n are typed function calls — they take input items, produce output items, and version independently. The three triggers to extract one:

Reuse. The same logic is called from two or more parent workflows. Extract it before the third one diverges.
Readability. The parent workflow has >30 nodes and the canvas needs scrolling. Extract by responsibility (auth, enrich, persist, notify) into 4 sub-workflows of 8-10 nodes each.
Partial retry. You want to retry the "persist" stage without re-running "auth" and "enrich". Each stage as a sub-workflow gives you per-stage retry and per-stage error workflows.

Naming convention that scales: {domain}.{verb} — e.g. billing.charge, billing.refund, billing.dunning-step. Folder structure in Git follows the same.

6. Error workflows: the one feature that pays for itself

In Workflow Settings → Error Workflow, point every production workflow at a single error-handling workflow. When any node fails (after node-level retries), n8n invokes that error workflow with the failed execution payload:

{
  "execution": {
    "id": "abc123",
    "url": "https://n8n.your-domain/execution/abc123",
    "retryOf": null,
    "error": { "message": "...", "stack": "..." },
    "lastNodeExecuted": "HTTP Request — charge customer",
    "mode": "trigger"
  },
  "workflow": { "id": "wf_42", "name": "Billing: charge customer" }
}

A typical error workflow does four things:

Alert — Slack/PagerDuty with severity by workflow tag.
Log — append to your observability stack (Datadog, Loki, OpenSearch).
Classify — Code node decides: retryable (network blip), business (validation failed), or critical (auth revoked).
Act — auto-retry the parent execution for retryable, open a ticket for business, page on-call for critical.

One error workflow per project, not per workflow. Treat it as your workflow-level catch.

7. Scaling complex workflows in production

The patterns above are correctness primitives. Once correctness is solved, scale comes from infrastructure:

Queue mode. EXECUTIONS_MODE=queue + Redis + N worker replicas. Linear horizontal scale; the only mode you should run in production past ~10k executions/day.
Managed Postgres. The default SQLite is fine for dev, never for prod. Move to managed Postgres before you have 100k executions of history to migrate.
Worker autoscaling. On Kubernetes, scale workers on Redis queue depth, not CPU. Idle workers cost almost nothing.
Execution data pruning. Set EXECUTIONS_DATA_PRUNE=true and EXECUTIONS_DATA_MAX_AGE=336 (14 days). Otherwise the DB grows unboundedly and the UI gets slow.
External secrets. Pull credentials from Vault / AWS Secrets Manager on Enterprise. On community, inject via $env + Docker secrets — works fine for most teams.

For the broader cost picture at each scale band — including when self-hosting stops paying off — see n8n vs Zapier self-hosting cost.

8. Real complex-workflow use cases

Multi-stage billing pipeline. Stripe webhook → idempotency check → Switch on event type → sub-workflow per type (charge / refund / dispute) → error workflow logs and opens a ticket on failure. ~40 nodes total, 3 sub-workflows, 1 error workflow.
AI document pipeline. S3 trigger → SplitInBatches → LangChain summarize + vector embed in parallel → Postgres upsert → Slack digest. Self-hosted with a local Ollama model means zero per-token cost.
SLA breach watcher. Cron every 5 min → Postgres query for open tickets past SLA → Switch on severity → escalate via PagerDuty / Slack / email. Idempotency key per ticket per hour to avoid spam.
CI/CD release coordinator. GitHub Release webhook → branch on semver → sub-workflows for changelog generation, Docker build trigger, multi-channel announce, customer-tier-aware email. One workflow, replaces three CI scripts and a Slack bot.
Customer onboarding state machine. Sign-up webhook → wait nodes between stages (welcome → 24h reminder → 7d activation check → 14d at-risk) → sub-workflow per stage. n8n's wait nodes hibernate the execution — no polling cost.

9. When the workflow is too complex for any workflow tool

Honest signal: if a workflow has >5 sub-workflows, custom error classification logic, per-stage retry policies, and a state machine that needs persistence — it might be an application, not a workflow. The escape hatches:

Stay in n8n if the surface area is mostly integrations (90% of cases). The patterns above scale further than people expect.
Move to Windmill if you want the workflow tool to behave more like a script runner with a UI on top.
Move to a real backend (Temporal, durable-execution libraries) if you need exactly-once semantics across multi-hour workflows with strict ordering.

For a wider landscape, see best Zapier alternatives.

n8n complex workflow patterns: branching, fan-out, retries, and error handling

TL;DR