How CitationBench's single generic agent loads skills, calls tools, pauses for approval, and persists every decision — the three-layer mental model for autonomous SEO ops.

CitationBench has one agent. It's a generic, durable, tool-using process that runs whatever task you hand it. The variety comes from skills — packaged capabilities the agent loads on demand (e.g., bootstrap_brand, rank_monitor, link_hunter, refresh_stale).

You invoke the agent the same way you call any other endpoint — but instead of returning a single result, the agent runs a graph of tool calls, can pause for human approval, can spawn child runs, and persists every decision it makes.

This is the page that explains how we think about agents — what they are, what they aren't, and how they relate to the rest of the platform.

The mental model

CitationBench has three layers. Most platforms only have one.

Layer 3  Agent + Skills      ← one generic agent. agent.invoke(skill, input).
                               Skills are named capabilities (bootstrap_brand,
                               link_hunter, …). The agent loads them, plans,
                               calls tools, pauses for approval, finishes.

Layer 2  Tools (REST + MCP)  ← composable primitives. research.keyword.research(...),
                               produce.blog_post.create(...), indexing.gsc.submit(...)

Layer 1  Resources           ← persistent objects. Keyword, BlogPost, LandingPage,
                               LinkBuildingRelationship, AgentInvocation, ...

The split matters because the agent is not magic — it's just a program that loads a skill, asks the LLM what tool to call, calls it, appends the result, and repeats until done. Every step is observable, replayable, and human-approvable.

You can:

Skip the agent entirely — call tools directly from your own code. Use CitationBench as a data layer.
Use a built-in skill — bootstrap_brand, rank_monitor, link_hunter, citation_hunter, content_factory, refresh_stale, keyword_manager, link_swap_evaluator. These are battle-tested skill definitions you invoke via the generic agent.
Bring your own agent loop — run your own LLM (Claude, GPT, Gemini) and use CitationBench tools via MCP. CitationBench becomes your tool layer; the agent loop runs wherever you want.

Why we modeled it this way

Three design constraints shaped this.

1. Durability beats brilliance. Agents that run for 20 minutes and silently lose state are useless. Every CitationBench agent invocation is a durable record — it survives restarts, can be queried at any time, and produces an immutable replay log. If a step fails, you see exactly which step, with what input, and what the LLM responded.

2. Approval is a first-class state. Agencies don't want autonomous publishing or autonomous outreach. They want speed plus the ability to gate any outside-world action behind a human approval. So every skill step can declare requiresApproval: true. The agent literally stops, the invocation state moves to WAITING_APPROVAL, and resumes only when an approver acts.

3. One agent, many skills — not many agents. Earlier designs had a registry of named agents. We collapsed that into one generic agent + a skills registry because:

Skills compose. The agent running bootstrap_brand may load and apply research.keyword and research.competitor skills mid-run. Treating them all as peer skills made composition obvious.
One agent has one lifecycle, one observability surface, one budget model. No per-agent special cases.
Users author new skills (prompt templates + tool lists) without us shipping a new "agent."

The data model

Three things back every agent invocation. You'll see them in API responses.

Invocation (the run)

One invocation = one agent run for one skill (which may chain into child invocations for sub-skills).

Field	Notes
`invocationId`	`inv_***` (CUID)
`agentId`	`agt_***` (CUID) — the specific agent instance that ran this invocation. Useful for audit / reproducibility / linking to debug traces.
`skill`	The skill that was invoked (`bootstrap_brand`, `link_hunter`, ...)
`skillsUsed`	All skills the agent actually loaded during the run (often more than just the primary)
`parentInvocationId`	Null for root; set for skills the agent chained into
`rootInvocationId`	Stable across the whole graph
`depth`	0 at root, 1 for children, 2 for grandchildren, ...
`brief`	Plain-language summary of what this invocation is doing
`mode`	`FOREGROUND` (synchronous wait) or `BACKGROUND` (fire and forget)
`status`	`PENDING`, `RUNNING`, `WAITING_INPUT`, `WAITING_APPROVAL`, `WAITING_CHILDREN`, `SUCCEEDED`, `FAILED`, `CANCELLED`
`result`	Final structured output (when status = SUCCEEDED), shape defined by the skill's outputSchema
`joinPolicy`	`ALL` (wait for every child) or `ANY` (resume when any child finishes)
`maxLlmCalls` / `llmCallsUsed`	Budget guardrails
`lastHeartbeatAt`	Liveness signal for stuck-invocation detection

Session (the conversation)

A session is a series of related invocations. Multi-turn chats live in one session. Useful for skills you talk to (keyword_manager, custom conversational skills).

Field	Notes
`sessionId`	`sess_***`
`title`	Human-readable label
`messages`	Full conversation log (system, user, assistant turns)
`loadedSkills`	Which skills the agent had access to in this session

Approval (the gate)

When a step pauses, an Approval record is created. Approving resumes the agent; rejecting kills the invocation.

Field	Notes
`approvalId`	`appr_***`
`invocationId`	Links to the paused invocation
`approver`	Email or user ID
`decision`	`APPROVED` or `REJECTED`
`decidedAt`	When the human acted
`note`	Free-text reason / edit notes

The universal response envelope

Every terminal invocation response carries five fields you'll see across the entire API:

Field	What it gives you
`invocationId`	Stable handle for this run; query, replay, cancel, audit
`agentId`	`agt_***` — the specific agent instance that ran
`result`	The typed structured output, shape defined by the skill's `outputSchema`
`raw`	The agent's raw text — its narration, reasoning, what it was about to do next
`files`	Array of file paths the agent wrote during the run — scratch notes, intermediate artifacts, final outputs. Read with Agent · files.

Treat result as the contract; raw + files are the audit trail. When you need the why behind a decision (and the structured result doesn't carry it), read raw and the files.

How agents fit with everything else

Concept	Relationship to Agent
Workspaces	Every invocation is scoped to one workspace. Cross-workspace runs spawn one child per workspace.
Tools	The skills' "actions." Skills are built out of CitationBench tools; custom skills can use yours too.
Approval Workflows	Each skill step can declare `requiresApproval: true`. State machine described above.
Durability	Invocations are durable — they survive restarts and produce an immutable replay log under the hood. You don't manage the orchestrator yourself.
Prompt templates	Skills are defined as prompt templates with a tool-access list. You can read, fork, or override them.
Files	The agent can read uploaded files and write its own workspace files during a run.

The built-in skill catalog

Eight built-in skills at v1. Each is fully observable, fully approvable, fully replayable.

Skill	What it does	Calls tools from
`bootstrap_brand`	URL → full SEO+GEO operating plan in 20 min	`produce.crawl` → `research.icp` → `research.keyword` → `research.competitor` → `research.discuss` → `produce.blog_post` (planning) → `produce.landing_page` (briefs)
`rank_monitor`	Recurring rank checks with conditional follow-ups	`distribute.track_rank` (cron) + optional `refresh_stale` on drop
`link_hunter`	End-to-end link building	`link_building.serp_outreach` → `link_building.crm.contact.discover` → `link_building.campaign.send_email`
`citation_hunter`	Daily AI search citation tracking + reclamation	`research.ai_citation.check` → on drop, `produce.refine`
`content_factory`	Keyword → research → draft → refine → publish	`research.discuss` → `produce.blog_post` → `produce.refine` → `produce.publish`
`refresh_stale`	Rank drop or citation drop → content audit → updated draft	`distribute.track_rank` → `produce.evaluate` → `produce.refine`
`keyword_manager`	Conversational keyword DB management	`research.keyword.list/update/relabel`
`link_swap_evaluator`	Score a partner's link-swap proposal	`research.competitor.backlinks` + Ahrefs DR lookups

You also can:

Fork a built-in skill (agent.skills.fork(slug)) to make a custom workspace-scoped version with different defaults.
Define your own by registering a new prompt template with the available tool list — no code deploy needed.

Code samples

REST

# Invoke a skill
curl -X POST https://api.citationbench.com/v1/agent/invoke \
  -H "Authorization: Bearer sk_live_***" \
  -H "X-Workspace-Id: ws_acme" \
  -H "Content-Type: application/json" \
  -d '{
    "skill": "bootstrap_brand",
    "input": { "domain": "acme.com", "depth": "thorough" },
    "approval": { "required": true },
    "mode": "BACKGROUND"
  }'

# → 202 Accepted
# {
#   "invocationId": "inv_01HVZ...",
#   "agentId":      "agt_01HVZ...",
#   "skill":        "bootstrap_brand",
#   "status":       "PENDING",
#   "links": { ... }
# }

MCP (natural language)

> Bootstrap acme.com — full SEO and GEO research. Pause at each step for me to approve.

Claude calls agent.invoke with skill: "bootstrap_brand" and the right input. The MCP server streams progress as notifications.

Common patterns

1. Fire-and-forget at agency scale

For agencies, the common pattern is BACKGROUND mode with cross-workspace fan-out. One call kicks off the same skill across every client workspace.

curl -X POST https://api.citationbench.com/v1/workspaces/bulk-action \
  -d '{
    "action": "agent.invoke",
    "workspaces": "all",
    "config": {
      "skill": "rank_monitor",
      "input": { "alertOn": { "drop": 5 } }
    }
  }'

2. Foreground with streaming

When a human is watching (Claude Code, CLI), use FOREGROUND + SSE event stream.

INVOCATION=$(curl -sf -X POST .../agent/invoke -d '{...}' | jq -r '.invocationId')
curl -N -H "Authorization: Bearer $KEY" \
  "https://api.citationbench.com/v1/agent/invocations/$INVOCATION/events"

3. Approval everywhere outbound

Set approval.required: true on any skill whose steps touch the outside world (publishing, outreach, indexing). The agent pauses; you decide.

4. Compose your own skill

For workflows that don't match a built-in, register a custom skill via the prompt-template API. Then invoke it like any other skill.

curl -X POST https://api.citationbench.com/v1/agent/invoke \
  -d '{
    "skill": "custom:my-weekly-audit",
    "input": { "workspaceId": "ws_acme" }
  }'

The custom: prefix tells the system to load from your workspace-scoped skill registry instead of the built-in registry.

5. Multi-turn conversation

For conversational skills like keyword_manager, pass sessionId to continue.

const first = await cb.agent.invoke({
  skill: "keyword_manager",
  input: {
    message: "Show me PROBLEM_SOLUTION keywords missing landing pages.",
  },
});

const second = await cb.agent.invoke({
  skill: "keyword_manager",
  sessionId: first.sessionId,
  input: { message: "Drop the ones with KD > 40." },
});

API: Agent · invoke
API: Agent · files
API: Agent · approval
API: Inventory
Concept: Approval workflows
Concept: Prompt templates
Playbook: Keyword research for a brand in 20 minutes
Playbook: Build an SEO agent in Claude Code

Agent: one durable, tool-using process driven by skills