Agent: one durable, tool-using process driven by skills
How CitationBench's single generic agent loads skills, calls tools, pauses for approval, and persists every decision — the three-layer mental model for autonomous SEO ops.
CitationBench has one agent. It's a generic, durable, tool-using process that runs whatever task you hand it. The variety comes from skills — packaged capabilities the agent loads on demand (e.g., bootstrap_brand, rank_monitor, link_hunter, refresh_stale).
You invoke the agent the same way you call any other endpoint — but instead of returning a single result, the agent runs a graph of tool calls, can pause for human approval, can spawn child runs, and persists every decision it makes.
This is the page that explains how we think about agents — what they are, what they aren't, and how they relate to the rest of the platform.
The mental model
CitationBench has three layers. Most platforms only have one.
Layer 3 Agent + Skills ← one generic agent. agent.invoke(skill, input).
Skills are named capabilities (bootstrap_brand,
link_hunter, …). The agent loads them, plans,
calls tools, pauses for approval, finishes.
Layer 2 Tools (REST + MCP) ← composable primitives. research.keyword.research(...),
produce.blog_post.create(...), indexing.gsc.submit(...)
Layer 1 Resources ← persistent objects. Keyword, BlogPost, LandingPage,
LinkBuildingRelationship, AgentInvocation, ...The split matters because the agent is not magic — it's just a program that loads a skill, asks the LLM what tool to call, calls it, appends the result, and repeats until done. Every step is observable, replayable, and human-approvable.
You can:
- Skip the agent entirely — call tools directly from your own code. Use CitationBench as a data layer.
- Use a built-in skill —
bootstrap_brand,rank_monitor,link_hunter,citation_hunter,content_factory,refresh_stale,keyword_manager,link_swap_evaluator. These are battle-tested skill definitions you invoke via the generic agent. - Bring your own agent loop — run your own LLM (Claude, GPT, Gemini) and use CitationBench tools via MCP. CitationBench becomes your tool layer; the agent loop runs wherever you want.
Why we modeled it this way
Three design constraints shaped this.
1. Durability beats brilliance. Agents that run for 20 minutes and silently lose state are useless. Every CitationBench agent invocation is a durable record — it survives restarts, can be queried at any time, and produces an immutable replay log. If a step fails, you see exactly which step, with what input, and what the LLM responded.
2. Approval is a first-class state. Agencies don't want autonomous publishing or autonomous outreach. They want speed plus the ability to gate any outside-world action behind a human approval. So every skill step can declare requiresApproval: true. The agent literally stops, the invocation state moves to WAITING_APPROVAL, and resumes only when an approver acts.
3. One agent, many skills — not many agents. Earlier designs had a registry of named agents. We collapsed that into one generic agent + a skills registry because:
- Skills compose. The agent running
bootstrap_brandmay load and applyresearch.keywordandresearch.competitorskills mid-run. Treating them all as peer skills made composition obvious. - One agent has one lifecycle, one observability surface, one budget model. No per-agent special cases.
- Users author new skills (prompt templates + tool lists) without us shipping a new "agent."
The data model
Three things back every agent invocation. You'll see them in API responses.
Invocation (the run)
One invocation = one agent run for one skill (which may chain into child invocations for sub-skills).
| Field | Notes |
|---|---|
invocationId | inv_*** (CUID) |
agentId | agt_*** (CUID) — the specific agent instance that ran this invocation. Useful for audit / reproducibility / linking to debug traces. |
skill | The skill that was invoked (bootstrap_brand, link_hunter, ...) |
skillsUsed | All skills the agent actually loaded during the run (often more than just the primary) |
parentInvocationId | Null for root; set for skills the agent chained into |
rootInvocationId | Stable across the whole graph |
depth | 0 at root, 1 for children, 2 for grandchildren, ... |
brief | Plain-language summary of what this invocation is doing |
mode | FOREGROUND (synchronous wait) or BACKGROUND (fire and forget) |
status | PENDING, RUNNING, WAITING_INPUT, WAITING_APPROVAL, WAITING_CHILDREN, SUCCEEDED, FAILED, CANCELLED |
result | Final structured output (when status = SUCCEEDED), shape defined by the skill's outputSchema |
joinPolicy | ALL (wait for every child) or ANY (resume when any child finishes) |
maxLlmCalls / llmCallsUsed | Budget guardrails |
lastHeartbeatAt | Liveness signal for stuck-invocation detection |
Session (the conversation)
A session is a series of related invocations. Multi-turn chats live in one session. Useful for skills you talk to (keyword_manager, custom conversational skills).
| Field | Notes |
|---|---|
sessionId | sess_*** |
title | Human-readable label |
messages | Full conversation log (system, user, assistant turns) |
loadedSkills | Which skills the agent had access to in this session |
Approval (the gate)
When a step pauses, an Approval record is created. Approving resumes the agent; rejecting kills the invocation.
| Field | Notes |
|---|---|
approvalId | appr_*** |
invocationId | Links to the paused invocation |
approver | Email or user ID |
decision | APPROVED or REJECTED |
decidedAt | When the human acted |
note | Free-text reason / edit notes |
The universal response envelope
Every terminal invocation response carries five fields you'll see across the entire API:
| Field | What it gives you |
|---|---|
invocationId | Stable handle for this run; query, replay, cancel, audit |
agentId | agt_*** — the specific agent instance that ran |
result | The typed structured output, shape defined by the skill's outputSchema |
raw | The agent's raw text — its narration, reasoning, what it was about to do next |
files | Array of file paths the agent wrote during the run — scratch notes, intermediate artifacts, final outputs. Read with Agent · files. |
Treat result as the contract; raw + files are the audit trail. When you need the why behind a decision (and the structured result doesn't carry it), read raw and the files.
How agents fit with everything else
| Concept | Relationship to Agent |
|---|---|
| Workspaces | Every invocation is scoped to one workspace. Cross-workspace runs spawn one child per workspace. |
| Tools | The skills' "actions." Skills are built out of CitationBench tools; custom skills can use yours too. |
| Approval Workflows | Each skill step can declare requiresApproval: true. State machine described above. |
| Durability | Invocations are durable — they survive restarts and produce an immutable replay log under the hood. You don't manage the orchestrator yourself. |
| Prompt templates | Skills are defined as prompt templates with a tool-access list. You can read, fork, or override them. |
| Files | The agent can read uploaded files and write its own workspace files during a run. |
The built-in skill catalog
Eight built-in skills at v1. Each is fully observable, fully approvable, fully replayable.
| Skill | What it does | Calls tools from |
|---|---|---|
bootstrap_brand | URL → full SEO+GEO operating plan in 20 min | produce.crawl → research.icp → research.keyword → research.competitor → research.discuss → produce.blog_post (planning) → produce.landing_page (briefs) |
rank_monitor | Recurring rank checks with conditional follow-ups | distribute.track_rank (cron) + optional refresh_stale on drop |
link_hunter | End-to-end link building | link_building.serp_outreach → link_building.crm.contact.discover → link_building.campaign.send_email |
citation_hunter | Daily AI search citation tracking + reclamation | research.ai_citation.check → on drop, produce.refine |
content_factory | Keyword → research → draft → refine → publish | research.discuss → produce.blog_post → produce.refine → produce.publish |
refresh_stale | Rank drop or citation drop → content audit → updated draft | distribute.track_rank → produce.evaluate → produce.refine |
keyword_manager | Conversational keyword DB management | research.keyword.list/update/relabel |
link_swap_evaluator | Score a partner's link-swap proposal | research.competitor.backlinks + Ahrefs DR lookups |
You also can:
- Fork a built-in skill (
agent.skills.fork(slug)) to make a custom workspace-scoped version with different defaults. - Define your own by registering a new prompt template with the available tool list — no code deploy needed.
Code samples
REST
# Invoke a skill
curl -X POST https://api.citationbench.com/v1/agent/invoke \
-H "Authorization: Bearer sk_live_***" \
-H "X-Workspace-Id: ws_acme" \
-H "Content-Type: application/json" \
-d '{
"skill": "bootstrap_brand",
"input": { "domain": "acme.com", "depth": "thorough" },
"approval": { "required": true },
"mode": "BACKGROUND"
}'
# → 202 Accepted
# {
# "invocationId": "inv_01HVZ...",
# "agentId": "agt_01HVZ...",
# "skill": "bootstrap_brand",
# "status": "PENDING",
# "links": { ... }
# }MCP (natural language)
> Bootstrap acme.com — full SEO and GEO research. Pause at each step for me to approve.Claude calls agent.invoke with skill: "bootstrap_brand" and the right input. The MCP server streams progress as notifications.
Common patterns
1. Fire-and-forget at agency scale
For agencies, the common pattern is BACKGROUND mode with cross-workspace fan-out. One call kicks off the same skill across every client workspace.
curl -X POST https://api.citationbench.com/v1/workspaces/bulk-action \
-d '{
"action": "agent.invoke",
"workspaces": "all",
"config": {
"skill": "rank_monitor",
"input": { "alertOn": { "drop": 5 } }
}
}'2. Foreground with streaming
When a human is watching (Claude Code, CLI), use FOREGROUND + SSE event stream.
INVOCATION=$(curl -sf -X POST .../agent/invoke -d '{...}' | jq -r '.invocationId')
curl -N -H "Authorization: Bearer $KEY" \
"https://api.citationbench.com/v1/agent/invocations/$INVOCATION/events"3. Approval everywhere outbound
Set approval.required: true on any skill whose steps touch the outside world (publishing, outreach, indexing). The agent pauses; you decide.
4. Compose your own skill
For workflows that don't match a built-in, register a custom skill via the prompt-template API. Then invoke it like any other skill.
curl -X POST https://api.citationbench.com/v1/agent/invoke \
-d '{
"skill": "custom:my-weekly-audit",
"input": { "workspaceId": "ws_acme" }
}'The custom: prefix tells the system to load from your workspace-scoped skill registry instead of the built-in registry.
5. Multi-turn conversation
For conversational skills like keyword_manager, pass sessionId to continue.
const first = await cb.agent.invoke({
skill: "keyword_manager",
input: {
message: "Show me PROBLEM_SOLUTION keywords missing landing pages.",
},
});
const second = await cb.agent.invoke({
skill: "keyword_manager",
sessionId: first.sessionId,
input: { message: "Drop the ones with KD > 40." },
});Related
- API: Agent · invoke
- API: Agent · files
- API: Agent · approval
- API: Inventory
- Concept: Approval workflows
- Concept: Prompt templates
- Playbook: Keyword research for a brand in 20 minutes
- Playbook: Build an SEO agent in Claude Code
Workspaces & multi-brand
Setup guide for running CitationBench across multiple brands. Solo workspace vs agency master key with per-workspace scoping, plus cross-workspace bulk operations.
Workspaces
How CitationBench isolates each brand's SEO/GEO data into a workspace, with agency master keys for portfolio-wide operations and bulk actions across clients.