How CitationBench models the Keyword resource — lifecycle states, source provenance, priority, pillar mapping, and rank history — separately from the 2D labeling system.

The Keyword is the most-used resource in CitationBench. Every blog post, landing page, rank check, outreach campaign, and AI citation eventually points back to one. This page explains how we model them — the lifecycle, the source provenance, the priority system, the pillar relationship — separately from the 2D labeling system (which gets its own concept page).

The short version

A Keyword is an org-scoped persistent record with a lifecycle (RAW → LABELLING → LABELED → FOCUSED → ARCHIVED)
Each carries provenance (where it came from), labels (the 2D taxonomy), tags, priority, optional pillar, rank history
The same keyword string can exist in many workspaces — but is unique within a workspace
Operations on keywords (research, search, label, tag, check rank) live on the Research · keyword API page

Why we modeled it this way

Three design constraints shaped this:

1. Provenance matters for trust. When an agent surfaces a keyword for content creation, you need to know where it came from. DataForSEO related-keywords, Ahrefs matching, an LLM mention pass, a Google Search Console import, a manual entry — each has different reliability. We carry source and sourceDetails on every keyword so downstream tools can weight differently.

2. Lifecycle states avoid premature commitment. Most keywords start as RAW — discovered but not yet evaluated. The labeling pass moves them to LABELED. The strategist (or the agent) promotes the ones worth pursuing to FOCUSED. Stale or off-target ones get ARCHIVED. You can run different tools against different lifecycle slices.

3. Priority + pillar + tags are three independent axes. Priority answers "how urgent." Pillar answers "what content theme." Tags answer everything else. Conflating them caused every previous SEO tool we used to drown its keyword DB in soup. Keeping them orthogonal lets you cross-filter naturally.

The data model

{
  "id": "kw_01HVZ...",
  "keyword": "project management software for engineering teams",
  "organizationId": "ws_acme",

  "source": "DATAFORSEO",
  "sourceDetails": {
    "method": "keyword_ideas",
    "seedTerm": "project management software"
  },
  "importedAt": "2026-05-24T08:01:42Z",
  "importBatchId": "inv_01HVZ...",

  "status": "LABELED",
  "priority": "HIGH",

  "intentLabels": ["SPECIFICATION", "PROBLEM_SOLUTION"],
  "intentConfidence": 0.91,
  "relevanceLabel": "OFFERING",
  "relevanceConfidence": 0.87,
  "isHighIntent": true,
  "isHighRelevance": true,
  "labelReason": "Searcher comparing PM tools by feature; aligned to our core offering.",
  "isAiGenerated": true,
  "labeledAt": "2026-05-24T08:02:18Z",

  "pillarId": "pil_pricing",
  "tags": ["q2-2026", "engineering-icp"],

  "parentKeywordId": null,
  "notes": null,

  "createdAt": "2026-05-24T08:01:42Z",
  "updatedAt": "2026-05-24T08:02:18Z"
}

Source enum

Value	Meaning
`DATAFORSEO`	Discovered via DataForSEO (related, ideas, llm mentions)
`AHREFS`	Discovered via Ahrefs (matching, related, suggestions)
`GOOGLE_SEARCH_CONSOLE`	Imported from a GSC property's actual impressions
`AI_SUGGESTION`	An agent (e.g. during `bootstrap_brand`) proposed it
`USER_INPUT`	Manually entered

Status enum

RAW         → discovered, not yet labeled
LABELLING   → currently in the labeling pass
LABELED     → has intent + relevance + confidence
FOCUSED     → promoted as a keyword we're actively pursuing
ARCHIVED    → soft-deleted; excluded from default queries

Lifecycle transitions are open — you can move a keyword from any state to any other. Most are agent-driven.

Priority enum

CRITICAL  → highest; agent prioritizes for content / rank tracking
HIGH      → next
MEDIUM    → default
LOW       → below default
BACKLOG   → discovered but not actionable yet

Used by agents.rank_monitor, content_factory, keyword_manager to order work.

How it interacts with other concepts

Concept	Relationship
2D Keyword Labelling	Every labeled keyword has an intent × relevance pair with confidence
Pillars	Keywords optionally belong to one `LandingPagePillar`. Pillars set default voice + landing page template.
Tags	Many-to-many; org-scoped reusable label set
BlogPost / LandingPage	Keywords are linked to one or more blog posts and landing pages (primary + secondary)
KeywordRank	Each keyword has a rolling history of `KeywordRank` records — position, URL, owned-domain flag, location, device
Competitor	`CompetitorKeyword` is a separate model — keywords your competitors rank for, tied to the competitor domain

Common patterns

1. Discover → label → focus → write

research.keyword         (discover, source: DATAFORSEO/AHREFS, status: RAW)
↓
labeling pass            (status: LABELLING → LABELED)
↓
keyword.search filters   (you pick the winners by label + KD + volume)
↓
PATCH keyword.priority = HIGH, status = FOCUSED
↓
content_factory          (writes content for FOCUSED keywords first)

2. Source-based filtering

Filter for keywords actually getting impressions in GSC (vs aspirational ones from DataForSEO):

curl -G .../v1/keywords --data-urlencode "source=GOOGLE_SEARCH_CONSOLE"

3. Parent–child variants

A keyword can have a parentKeywordId pointing to a "head" keyword it's a long-tail variant of. Useful for clustering.

4. Bulk import + auto-label

curl -X POST .../v1/keywords/bulk -d '{
  "keywords": [
    { "keyword": "...", "source": "USER_INPUT" },
    { "keyword": "...", "source": "USER_INPUT" }
  ],
  "label": true,
  "pillarId": "pil_pricing"
}'

5. Relabel after a product change

Edit your workspace's product description, then:

curl -X POST .../v1/keywords/relabel -d '{ "scope": { "status": "LABELED" } }'

The agent re-runs the 2D labeling pass with the new context.

API: Research · keyword — full CRUD
Concept: 2D Keyword Labelling
Concept: Workspaces
Concept: Agent
Playbook: Keyword research for a brand in 20 minutes

Keywords: lifecycle, provenance, priority, and pillar