REST API for scoring blog posts and landing pages against customizable rubrics. Returns per-criterion scores, weighted overall scores, flagged issues, and recommended actions for any agentic content pipeline.

Score content against a rubric — readability, SEO quality, brand alignment, keyword optimization, factual accuracy, and any custom criteria you define. Useful before publishing, before refreshing, and as a gate in any agentic content pipeline.

Conceptual overview

Evaluation runs a rubric — a set of named criteria with weights — against a piece of content and returns:

Per-criterion scores (0–100) with reasoning
A weighted overall score
Specific issues flagged (sections too short, missing primary keyword, broken internal links, sentences that exceed grade-level target)
Recommended actions to lift the score

Rubrics are workspace-scoped. The platform ships with several system rubrics (system.seo, system.readability, system.brand). You can fork, customize, or build your own.

Endpoints

Method	Path	Purpose
POST	`/v1/produce/evaluate`	Evaluate one content against a rubric
GET	`/v1/produce/evaluate`	List evaluations
GET	`/v1/produce/evaluate/{id}`	Get one evaluation
POST	`/v1/produce/evaluate/compare`	Compare two evaluations (e.g., before/after refinement)
GET	`/v1/produce/evaluate/rubric`	List rubrics
POST	`/v1/produce/evaluate/rubric`	Create a rubric
PATCH	`/v1/produce/evaluate/rubric/{id}`	Update
DELETE	`/v1/produce/evaluate/rubric/{id}`	Delete
POST	`/v1/produce/evaluate/rubric/system.{slug}/fork`	Fork a system rubric

POST /v1/produce/evaluate

{
  "contentType": "blog_post",
  "contentId": "bp_***",
  "rubricId": "rub_acme-blog-standard",
  "primaryKeywordId": "kw_***"
}

Field	Type	Required	Notes
`contentType`	`"blog_post" \| "landing_page" \| "content"`	yes	—
`contentId`	string	yes	—
`rubricId`	string	no	Workspace default if absent
`primaryKeywordId`	string	no	For SEO scoring; if absent, inferred from content's linked keywords
`additionalCriteria`	object	no	Ad-hoc criteria added on top of the rubric

Response

{
  "invocationId": "inv_***",
  "agentId": "agt_***",
  "skill": "produce.evaluate",
  "status": "RUNNING",
  "estimatedCost": { "credits": 4, "durationSeconds": 25 }
}

Final result

{
  "invocationId": "inv_***",
  "agentId": "agt_***",
  "skill": "produce.evaluate",
  "status": "SUCCEEDED",
  "creditsUsed": 4,
  "result": {
    "evaluationId": "ev_***",
    "contentId": "bp_***",
    "rubricId": "rub_acme-blog-standard",
    "rubricVersion": 3,
    "overallScore": 78,
    "verdict": "PUBLISH_READY",
    "criteria": [
      {
        "name": "readability",
        "weight": 0.2,
        "score": 82,
        "reasoning": "Grade level 9.1; average sentence 17 words; intro engaging.",
        "issues": []
      },
      {
        "name": "seo_keyword_coverage",
        "weight": 0.25,
        "score": 74,
        "reasoning": "Primary keyword in H1 and 4 sub-sections. Missing from meta description.",
        "issues": [
          {
            "type": "primary_keyword_missing_meta",
            "fixHint": "Add 'engineering team capacity tracking' to meta description"
          }
        ]
      },
      {
        "name": "brand_voice",
        "weight": 0.2,
        "score": 88,
        "reasoning": "Strong second-person voice throughout. Two corporate cliches in section 4.",
        "issues": [
          {
            "type": "corporate_cliche",
            "section": 4,
            "phrase": "synergize across stakeholders"
          }
        ]
      },
      {
        "name": "depth",
        "weight": 0.2,
        "score": 70,
        "reasoning": "2,247 words. SERP top-3 average 3,100. Underdeveloped on capacity forecasting subtopic.",
        "issues": [
          { "type": "underdeveloped_subtopic", "topic": "capacity forecasting" }
        ]
      },
      {
        "name": "fact_quality",
        "weight": 0.15,
        "score": 76,
        "reasoning": "5 verifiable claims; 4 with citations. One claim ('80% of teams ...') lacks a source.",
        "issues": [
          {
            "type": "unsupported_claim",
            "section": 2,
            "claim": "80% of teams ..."
          }
        ]
      }
    ],
    "recommendedActions": [
      {
        "action": "produce.refine",
        "refinerId": "rfn_seo-cleanup",
        "expectedScoreDelta": +6
      },
      {
        "action": "produce.blog_post.regenerate",
        "scope": "section:capacity_forecasting",
        "expectedScoreDelta": +5
      }
    ]
  },
  "raw": "Scored 78. Two big lifts available: SEO cleanup (meta description) and expanding the capacity forecasting section ...",
  "files": [
    "agent-workspace/scorecard.json",
    "agent-output/evaluation-report.md"
  ]
}

Verdict values

Verdict	Score range	Meaning
`PUBLISH_READY`	80–100	Good to ship
`MINOR_FIXES`	60–79	Refine first; specific fixes listed
`MAJOR_REWORK`	40–59	Regenerate sections or refresh thoroughly
`SCRAP`	0–39	Better to start over

POST /v1/produce/evaluate/compare

Compare two evaluations of the same (or related) content — useful for before/after refinement.

curl -X POST .../v1/produce/evaluate/compare -d '{
  "evaluationIds": ["ev_***A", "ev_***B"]
}'

{
  "delta": {
    "overallScore": +6,
    "byCriterion": {
      "readability": +2,
      "seo_keyword_coverage": +9,
      "brand_voice": 0,
      "depth": +4,
      "fact_quality": 0
    },
    "issuesResolved": ["primary_keyword_missing_meta"],
    "issuesRemaining": ["corporate_cliche", "underdeveloped_subtopic"]
  }
}

CRUD: /v1/produce/evaluate/rubric

Create a rubric

curl -X POST .../v1/produce/evaluate/rubric -d '{
  "name":          "Acme blog standard",
  "description":   "Our standard rubric for blog posts",
  "appliesTo":     ["blog_post"],
  "criteria": [
    { "name": "readability",           "weight": 0.20, "type": "system.readability" },
    { "name": "seo_keyword_coverage",  "weight": 0.25, "type": "system.seo" },
    { "name": "brand_voice",           "weight": 0.20, "type": "llm_check", "prompt": "Rate how strongly this article matches our brand voice ..." },
    { "name": "depth",                 "weight": 0.20, "type": "system.depth_vs_serp" },
    { "name": "fact_quality",          "weight": 0.15, "type": "llm_check", "prompt": "Rate fact-checking quality ..." }
  ]
}'

List + update + delete

curl .../v1/produce/evaluate/rubric
curl -X PATCH .../v1/produce/evaluate/rubric/rub_***
curl -X DELETE .../v1/produce/evaluate/rubric/rub_***

Built-in rubrics

system.seo — keyword coverage, on-page SEO basics
system.readability — grade level, sentence length, structure
system.brand — generic brand voice check (you'll likely want to fork this)
system.depth-vs-serp — compare depth to current SERP top-3
system.fact-quality — citation density, claim verifiability

Fork:

curl -X POST .../v1/produce/evaluate/rubric/system.seo/fork -d '{
  "name": "Acme SEO (customized)"
}'

MCP

> Evaluate bp_*** against our blog standard.

Claude calls produce.evaluate.score.

> Compare the eval before and after the refine.

Claude calls produce.evaluate.compare.

> What's wrong with bp_*** — just the issues, not the scores.

Claude calls produce.evaluate.score and renders only criteria[].issues[].

Errors

Status	Code	Cause
404	`content_not_found`	—
404	`rubric_not_found`	—
422	`incompatible_rubric`	Rubric's `appliesTo` doesn't include this content type

Cost

Action	Credits
Per evaluation	4 (default rubric)
Per custom criterion (LLM-based)	+1 each
Compare	free
CRUD rubrics	free

Use cases (string things together)

A. Eval gate before publish

# Eval Gate
when:
  field_lt: { evaluationScore: 75 }
then:
  action: escalate_to_approval
  reason: "Score below threshold — manual review"

produce.publish runs the eval gate; sub-75 evaluations pause for approval.

B. Eval-then-refine loop

EV=$(curl -sf -X POST .../v1/produce/evaluate -d '{...}' | jq -r '.invocationId')
# wait...
ACTIONS=$(curl -sf .../v1/agent/invocations/$EV | jq -c '.result.recommendedActions')
# Auto-apply the top recommended action

The refresh_stale skill does this loop autonomously.

C. A/B refiner experiment

Apply two competing refiners to the same draft (two revisions), evaluate both, compare. Keep the winner.

D. Cross-portfolio quality audit

curl -X POST .../v1/workspaces/bulk-action -d '{
  "action":     "produce.evaluate.score",
  "workspaces": "all",
  "config": {
    "contentType": "blog_post",
    "scope": { "publishStatus": "PUBLISHED", "publishedAfter": "2026-04-01" }
  }
}'

Returns a quality scorecard per workspace.

API: Production · refine
API: Production · blog post
API: Production · landing page
API: Production · publish
Concept: Eval Gates
Playbook: Refresh stale content on rank drops

Content Evaluation API — Score Articles Against SEO, Readability, and Brand Rubrics