Keywords: lifecycle, provenance, priority, and pillar
How CitationBench models the Keyword resource — lifecycle states, source provenance, priority, pillar mapping, and rank history — separately from the 2D labeling system.
The Keyword is the most-used resource in CitationBench. Every blog post, landing page, rank check, outreach campaign, and AI citation eventually points back to one. This page explains how we model them — the lifecycle, the source provenance, the priority system, the pillar relationship — separately from the 2D labeling system (which gets its own concept page).
The short version
- A Keyword is an org-scoped persistent record with a lifecycle (
RAW→LABELLING→LABELED→FOCUSED→ARCHIVED) - Each carries provenance (where it came from), labels (the 2D taxonomy), tags, priority, optional pillar, rank history
- The same keyword string can exist in many workspaces — but is unique within a workspace
- Operations on keywords (research, search, label, tag, check rank) live on the Research · keyword API page
Why we modeled it this way
Three design constraints shaped this:
1. Provenance matters for trust. When an agent surfaces a keyword for content creation, you need to know where it came from. DataForSEO related-keywords, Ahrefs matching, an LLM mention pass, a Google Search Console import, a manual entry — each has different reliability. We carry source and sourceDetails on every keyword so downstream tools can weight differently.
2. Lifecycle states avoid premature commitment. Most keywords start as RAW — discovered but not yet evaluated. The labeling pass moves them to LABELED. The strategist (or the agent) promotes the ones worth pursuing to FOCUSED. Stale or off-target ones get ARCHIVED. You can run different tools against different lifecycle slices.
3. Priority + pillar + tags are three independent axes. Priority answers "how urgent." Pillar answers "what content theme." Tags answer everything else. Conflating them caused every previous SEO tool we used to drown its keyword DB in soup. Keeping them orthogonal lets you cross-filter naturally.
The data model
{
"id": "kw_01HVZ...",
"keyword": "project management software for engineering teams",
"organizationId": "ws_acme",
"source": "DATAFORSEO",
"sourceDetails": {
"method": "keyword_ideas",
"seedTerm": "project management software"
},
"importedAt": "2026-05-24T08:01:42Z",
"importBatchId": "inv_01HVZ...",
"status": "LABELED",
"priority": "HIGH",
"intentLabels": ["SPECIFICATION", "PROBLEM_SOLUTION"],
"intentConfidence": 0.91,
"relevanceLabel": "OFFERING",
"relevanceConfidence": 0.87,
"isHighIntent": true,
"isHighRelevance": true,
"labelReason": "Searcher comparing PM tools by feature; aligned to our core offering.",
"isAiGenerated": true,
"labeledAt": "2026-05-24T08:02:18Z",
"pillarId": "pil_pricing",
"tags": ["q2-2026", "engineering-icp"],
"parentKeywordId": null,
"notes": null,
"createdAt": "2026-05-24T08:01:42Z",
"updatedAt": "2026-05-24T08:02:18Z"
}Source enum
| Value | Meaning |
|---|---|
DATAFORSEO | Discovered via DataForSEO (related, ideas, llm mentions) |
AHREFS | Discovered via Ahrefs (matching, related, suggestions) |
GOOGLE_SEARCH_CONSOLE | Imported from a GSC property's actual impressions |
AI_SUGGESTION | An agent (e.g. during bootstrap_brand) proposed it |
USER_INPUT | Manually entered |
Status enum
RAW → discovered, not yet labeled
LABELLING → currently in the labeling pass
LABELED → has intent + relevance + confidence
FOCUSED → promoted as a keyword we're actively pursuing
ARCHIVED → soft-deleted; excluded from default queriesLifecycle transitions are open — you can move a keyword from any state to any other. Most are agent-driven.
Priority enum
CRITICAL → highest; agent prioritizes for content / rank tracking
HIGH → next
MEDIUM → default
LOW → below default
BACKLOG → discovered but not actionable yetUsed by agents.rank_monitor, content_factory, keyword_manager to order work.
How it interacts with other concepts
| Concept | Relationship |
|---|---|
| 2D Keyword Labelling | Every labeled keyword has an intent × relevance pair with confidence |
| Pillars | Keywords optionally belong to one LandingPagePillar. Pillars set default voice + landing page template. |
| Tags | Many-to-many; org-scoped reusable label set |
| BlogPost / LandingPage | Keywords are linked to one or more blog posts and landing pages (primary + secondary) |
| KeywordRank | Each keyword has a rolling history of KeywordRank records — position, URL, owned-domain flag, location, device |
| Competitor | CompetitorKeyword is a separate model — keywords your competitors rank for, tied to the competitor domain |
Common patterns
1. Discover → label → focus → write
research.keyword (discover, source: DATAFORSEO/AHREFS, status: RAW)
↓
labeling pass (status: LABELLING → LABELED)
↓
keyword.search filters (you pick the winners by label + KD + volume)
↓
PATCH keyword.priority = HIGH, status = FOCUSED
↓
content_factory (writes content for FOCUSED keywords first)2. Source-based filtering
Filter for keywords actually getting impressions in GSC (vs aspirational ones from DataForSEO):
curl -G .../v1/keywords --data-urlencode "source=GOOGLE_SEARCH_CONSOLE"3. Parent–child variants
A keyword can have a parentKeywordId pointing to a "head" keyword it's a long-tail variant of. Useful for clustering.
4. Bulk import + auto-label
curl -X POST .../v1/keywords/bulk -d '{
"keywords": [
{ "keyword": "...", "source": "USER_INPUT" },
{ "keyword": "...", "source": "USER_INPUT" }
],
"label": true,
"pillarId": "pil_pricing"
}'5. Relabel after a product change
Edit your workspace's product description, then:
curl -X POST .../v1/keywords/relabel -d '{ "scope": { "status": "LABELED" } }'The agent re-runs the 2D labeling pass with the new context.
Related
- API: Research · keyword — full CRUD
- Concept: 2D Keyword Labelling
- Concept: Workspaces
- Concept: Agent
- Playbook: Keyword research for a brand in 20 minutes
Workspaces
How CitationBench isolates each brand's SEO/GEO data into a workspace, with agency master keys for portfolio-wide operations and bulk actions across clients.
2D Keyword Labelling
Every keyword carries two labels — intent and relevance — plus confidence scores, replacing coarse single-axis intent tagging with the labels that actually drive content decisions.