Variant Tool—Operator Documentation

Last updated 2026-06-07 · Source: workers/variant/ + docs/variant/

The Variant Tool (/variant/) takes a headline you already wrote and hands back 3–10 (typically 5–10 after quality filtering) publication-ready alternative phrasings of the same article, plus one deliberately off-pattern "wildcard." You keep the ones you like and discard the rest. Every standard option is pre-checked against the same automated grader the Grader page runs, so anything offered already clears the quality bar. The tool can also run two opt-in levers: a headline style test (how the headline frames a trend) and an expert lever (surfacing a real expert or study the article names). Both are off by default; if you don't touch them, the tool behaves exactly as it always has.

This page is the full operator reference. For shared concepts the whole console uses—page-views, lift, vertical, the formula taxonomy, the grader's criteria—see the main docs site. For the technical architecture + D1 schema behind this tool, see Part IV—Variant tool of the main docs. Plain-language definitions here match the in-tool ? help bubbles word-for-word.

Quick start ↑ top

Open /variant/.
Paste the seed headline you want alternatives for. This is the version you wrote; the tool rephrases the same article, it doesn't pitch a different or more ambitious one. (20–500 characters.)
List your primary keywords, comma-separated (e.g. vagus nerve, anxiety). Every option the tool offers will contain all of them. Capitalization doesn't matter.
Optional: add secondary keywords (comma-separated). These earn bonus credit when a variant happens to include them, but aren't required.
Optional but recommended: paste the full article body. This is the source of truth the grader checks each variant against, and pasting it unlocks the most—and the most varied—options. (See article text below.)
Optional levers: tick the expert box (or describe one in the cite field) if the article cites one; pick a Headline type if you're running the style test. Both are covered in detail below.
Optional, under Advanced: set Format / Persona / Platform / Outlet to grade against a specific editorial rule set, and paste any prior headlines into Also avoid so the new options route around them.
Click Generate variants and wait roughly 5–15 seconds (a progress bar shows the stage).
Review the cards. Each shows the headline, its score, how different it is from your seed, its length, the grader breakdown (collapsed), and Copy + Keep buttons.
Click Keep on the option(s) you'd ship. Multi-pick is supported; the button toggles to Kept ✓ (click again to unkeep). Your picks are saved server-side and survive regenerating the same seed.
Don't love the batch? Click ↻ Regenerate for a fresh set with the same inputs. (Each generation counts once against your quota.)

The form fields ↑ top

Top to bottom, here is every input and exactly what it does. Required fields are marked; the Generate button stays disabled until the seed headline (≥20 chars) and at least one primary keyword are filled.

Field	Required?	What it does
Seed headline	Yes	The headline you wrote and want alternatives for. The tool offers rephrasings of the same article. 20–500 characters.
Primary keywords	Yes	The must-have SEO terms. Every option contains all of them (a firm editorial rule). Comma-separate. If requiring a term would box the options in too tightly, move it to Secondary instead.
Secondary keywords	No	Related terms that earn bonus credit when a variant naturally includes them, but are never required—so they don't constrain the wording.
Article text	No (recommended)	Paste the full article. This switches the tool into "maximize difference" mode (covered below): options are pushed further apart from each other, and each is fact-checked against the article instead of just your headline. With the article, options can name a specific technique, finding, expert, or comparison that's in the article but not in your headline. Up to 50,000 characters.
Expert lever (checkbox + "which expert" field)	No	Surfaces a real expert or study the article names—the single biggest proven boost to reads. See The expert lever.
Also avoid (Advanced)	No	Paste prior headlines you've already published, one per line. The new options are pushed to be distinct from all of them—handy for sibling articles in the same cluster.
Format (Advanced)	No	If the article follows a finalized content format (e.g. Everything to Know), pick it so the tool grades against that format's headline rule instead of the general one. Leave on general otherwise.
Persona (Advanced)	No	The reader to write for: Discover Browser (someone casually scrolling Google Discover who has to be hooked) or Curious Optimizer (TH) (an already-engaged Trend Hunter reader). Leave on none if neither fits.
Platform (Advanced)	No	Where the headline runs, if the destination has its own length rule: Apple News (90–120 chars), SmartNews (70–90), Trend Hunter B2C (90–104), MSN, Google News. Leave on general (about 90–104) for a standard headline.
Outlet (Advanced)	No	The specific publication, if it has its own length rule. US Weekly, Woman's World, and Life & Style use a tighter 90–100. Leave on the default (90–104) for everything else.
Headline type (TEST)	No	The style test: how the headline frames where a trend is in its life. None = a normal headline, nothing changes. See The headline style test.

About the dropdowns: only finalized formats, personas, platforms, and outlets are listed. If yours isn't there, leave the dropdown on its default—the general H1 rules apply (about 90–104 characters, no platform-specific constraint).

The headline style test ↑ top

The Headline type dropdown (marked TEST) runs an experiment: it writes your headline in one of three framings, depending on where a trend sits in its life. The goal is to learn, from real reader behavior over time, which framing performs best. It changes the headline wording only—your article body is untouched.

Leaving it on "None" gives you today's tool, unchanged. The style test is entirely opt-in.

The three framings

Type	The angle	How the headline sounds
Established `CTRL`	The settled, authority-backed take. The trend is recognized; you're explaining it. This is also the experimental baseline.	"…Explained" · "What the Research Says…" · "Scientists Confirm…"
Emerging `EMG`	The trend is just catching on; the reader is ahead of the curve. Don't overstate it.	"Just Starting to…" · "Is Growing" · "Why Experts Are Increasingly…"
Forecast `FWD`	Look ahead: where this is headed next.	"What Comes Next…" · "Where [X] Is Headed" · "What This Means for You"

The one rule that keeps the test fair

Within a cluster—the set of articles you write for one trend—give each article a different style. Never repeat a style inside a cluster. If a trend gets three articles, make one Established, one Emerging, one Forecast. That balance is what lets the data tell whether a style drove a result rather than just the topic. This rule lives in your workflow, not in the tool—the tool generates one type per run and has no view of the rest of the cluster.

How you confirm a variant carries the style

When you've picked a Headline type, each result card adds a small framing chip so you can see, at a glance, whether the option actually carries the wording you asked for:

✓ EMG framing — the option carries the trend-stage wording you chose. Good to publish as that test type.
ⓘ EMG: off-pattern — it doesn't use the typical EMG signal words, so it's tagged "off-pattern" for the rolling framing study. It still cleared every quality check and lost no score—keep it if it reads well.

The framing chip is a labeling tag, not a quality flag—it never changes the score. The style under test is an open hypothesis, so the tool refuses to let an unproven framing pull down a headline's grade or block it from the pool. The tag just records whether the option carried the chosen wording, so your "which style won" comparison later is clean—an "off-pattern" option is a perfectly good headline, just not a clean data point for that arm. (The expert lever below is different—it is scored, because it's already proven.)

The expert lever ↑ top

Surfacing a real, credentialed expert or a specific study in the headline is the single biggest proven lever for reads—the June 2026 headline analysis measured it as the highest-impact finding by a wide margin. The expert lever turns that finding on for any generation. It sits just under the article body, because the expert has to come from the article.

It has two controls, which work together:

The checkbox—"Article names an expert or cites a study — surface it." Tick it when the article body names a credentialed expert (a doctor, researcher, professor) or cites a specific study, institution, or statistic. The tool will feature that expert or study in the headline.
The "which expert / study" field (optional free text). Describe the one you want surfaced however you like—a precise name, a study, or a plain phrase like "the sleep doctor" or "the magnesium study in paragraph two." The tool matches your description to the article, so you don't need exact wording. Leave it blank to let the tool pick the strongest one from the body.

How the two controls interact

Typing a name in the cite field turns the lever on by itself—you don't also need to tick the box. The tool resolves the controls by precedence: a named cite wins (it's the most specific instruction), then the checkbox, then an always-on background scan of the article for expert/study signals. So the cite field takes precedence over the checkbox: name an expert and the lever fires regardless of the box's state.

The hard guarantee: it never invents an expert

The expert or study must actually appear in the article body you pasted. The tool will never invent one. If your article names no expert and states no study or statistic, you simply get a normal headline—no fabricated credential, name, study, year, or percentage. This is enforced in the generation prompt itself (a body-only anti-fabrication rule) and is treated as a hard failure if violated, the same anti-invention floor the tool already applies to every factual claim.

Because the expert lever is a proven result rather than an open hypothesis, it is scored and enforced: when you flag an expert and the article supports one, a variant that fails to surface it is penalized by the grader (it carries real weight in the score). That's the deliberate opposite of the style test's framing chip, which is never scored. Both still share the same floor: the tool can only surface what the article actually contains.

Reading the result cards ↑ top

Each option comes back as a card: the headline, a row of small chips, a collapsible grader breakdown, and Copy + Keep buttons. An always-visible legend at the top of the results repeats these definitions, and every chip also carries the same plain explanation on hover or long-press. Here is the full set.

The chips on each card

Chip	Looks like	What it means
score	score 92	Quality score, 0–100, from the same automated grader the Grader page uses. Higher is better. Every standard (non-wildcard) option shown scored 85 or above—the tool already dropped anything lower—so use this to rank the options, not to decide pass/fail (they all passed).
cosine (vs your headline)	cosine 0.62 vs your headline	How close this option is in meaning to the headline you pasted. Cosine = meaning-similarity on a 0.00–1.00 scale: 0.00 = totally different, 1.00 = basically identical. Lower means a more genuinely different rewrite. Near-duplicates were already filtered out.
cosine (vs avoid-list)	cosine 0.41 vs avoid-list	Only appears if you used Also avoid or the tool auto-avoided prior picks from the same cluster. It's the meaning-similarity to the closest headline on that avoid-list, so you can confirm this option doesn't collide with something you've already published. Lower is safer.
chars	98 chars	The option's exact character count, spaces included. The general target band is about 90–104. Options that run slightly over (up to ~115) now come through carrying an amber ⚑ flag instead of being dropped—trim or ship at your judgment. A chosen Platform or Outlet narrows the band (e.g. Apple News 90–120).
⚑ flag	⚑ 6 over target	An editorial-judgment flag: the option cleared every hard check but trips a relaxable preference you should weigh—today, running a few characters over the 104 target (within ~11). The tool surfaces it so you decide whether to trim or keep it. It's never a defect and never lowered the score. More relaxable checks may flag here over time.
wildcard	⚡ wildcard	The one experimental, deliberately off-pattern option per batch (only the card tagged ⚡). It's checked for hard rules and accuracy but skips the style judgments the others pass, so its score isn't comparable to theirs. Read it as a creative prompt and verify it editorially before publishing.
framing ✓ / ⓘ	✓ FWD framing ⓘ FWD: off-pattern	Only appears when you picked a Headline type. Green ✓ = the option carries the trend-stage wording you chose; ⓘ off-pattern = it doesn't use that typical wording, so it's tagged for the rolling framing study. Labeling tag, not a quality flag—it never changes the score, and an off-pattern option is still a fine headline. (See the style test.)
accuracy (wildcards only)	accuracy: article-supported	The accuracy safety-net note for a wildcard: the grader's short reason for judging it factually consistent with your headline (and the article, if you pasted one). Wildcards skip the style checks, so this is the one substantive check they get.

The grader breakdown

Click the grader breakdown line on any card to expand the per-rule detail behind its score—the same line-by-line readout the Grader page shows:

✓ = passed that rule · ✗ = failed it · · = the rule doesn't apply to this headline.

Rules cover length, named-entity lead, formula, keyword presence, active voice, no lead burial, curiosity gap, factual accuracy, the punctuation ban, and—when you flagged one—the expert/study signal. Rules that don't apply (e.g. an Apple News title length when no Apple News title was supplied, or the expert check when no expert was flagged) show · and drop out of the score entirely; they neither help nor hurt.

The run-stats row

The small grey row under the "Variants" heading is diagnostics—you don't need to act on it. It reports what happened during the run:

article-mode (max diversity) article-mode vs default mode—whether you pasted an article body (and so got the more-varied, article-fact-checked options).
rej cosine / rej grader / rej diversity—how many candidates were dropped for being too similar to your seed, for scoring too low, or for being too alike each other.
regen passes—how many extra retry rounds it took to fill the set.
offered—how many it produced (rule-following + wildcard).
est cost / duration—the run's estimated spend and time.
req—the run's ID; quote it if you ever need to ask about a specific run.
"N prior cluster picks auto-avoided"—headlines you previously kept that follow the same pattern were automatically added to the avoid-list, so these options are pushed to differ from what you've already published.

What the score floor guarantees ↑ top

Every standard option offered for selection has already cleared two bars, so you should never see a low-quality option:

Hard structural rules pass at 100%. Named-entity lead, formula present, every primary keyword, no banned punctuation, no all-caps (outside known acronyms). A candidate that misses any of these is dropped silently—non-negotiable. Length is enforced too, with one tolerance: an option a few characters over the 90–104 target (up to ~115) comes through carrying an amber ⚑ editorial-judgment flag rather than being dropped, so you get a fuller set and decide whether to trim. (The Grader tab still hard-fails any headline over target; only the variant tool offers this tolerance.)
The graded score is ≥ 85. This covers the judgment calls—active voice, no lead burial, curiosity gap, factual accuracy. The 85 floor tolerates a single borderline stylistic miss but rejects combinations or any substantive issue (a buried lead, an accuracy problem). When you paste the article body, accuracy is judged against the article; without it, against the headline alone.

Candidates that fail either bar drop out, and the pipeline regenerates with explicit failure feedback for up to two extra passes to refill the set. If it still can't reach your requested count, you get fewer options plus a "what to try" panel and a one-click Regenerate—not filler.

One exception: the ⚡ wildcard is intentionally allowed to skip the style judgments (see below), so its score isn't comparable. It still passes every hard structural rule plus an accuracy check.

The wildcard slot ↑ top

Every generation returns exactly one option tagged ⚡ wildcard. It's written under a different prompt that's encouraged to depart structurally from a typical headline for your format and persona. It must still pass every hard rule (length, named-entity lead, no banned patterns) plus a single accuracy safety-net check—but it intentionally bypasses the rule-following style judgments (active voice, lead burial, curiosity), because those would fight its job of being structurally novel.

Verify wildcards editorially before publishing. They're off-pattern by design—read each as a creative prompt, not a vetted option.

The wildcard isn't only for you. Every wildcard offered—kept or not—feeds a long-term study aimed at discovering new high-performing headline formats. scripts/analyze_variant_cohort.py periodically scans the accumulated wildcards, clusters them by structural fingerprint (formula × leading part-of-speech × part-of-speech sequence (the grammatical shape of how the words are arranged)), and surfaces any pattern whose pick-rate or post-publish performance materially beats the rule-following baseline. Patterns that clear the significance bars become candidates for promotion into the standard format library.

Why no variant-vs-variant diversity gates ↑ top

The options are alternatives for you to pick from, not a set published together. You keep one (or a few) and discard the rest, so two options sharing a leading word, a formula, or some content tokens is fine—they're competing, not co-published. This is a key way the Variant Tool differs from the routine grader of live headlines, which evaluates published output against editorial diversity standards. Different setting, different rules.

There's one nuance. When you paste the article body, the tool switches into "maximize difference" mode and does push the standard options apart from each other—because the article gives the model real, fact-supported latitude (different facets, mechanisms, audiences, comparisons) to diverge legitimately. Without an article, no pairwise diversity gate applies between options.

Concerns about cross-publication similarity—sibling headlines in one cluster looking alike on the live site—are handled by the Also avoid input plus automatic cluster auto-avoid: prior picks from the same cluster are routed around via cosine. Here are the gates that are always active:

Gate	Threshold	What it catches
Cosine vs your seed headline	< 0.97 (0.93 in article-mode)	Near-literal duplicates of the source. Tighter in article-mode, where the article gives options room to legitimately diverge.
Cosine vs each "also avoid" entry	< 0.97	Collisions with headlines you've already published from the same cluster.
Pairwise cosine + word-overlap between standard options	cosine < 0.85, jaccard < 0.45 (article-mode only)	Forces facet diversity across the set. Jaccard = word-overlap (the share of distinct words two headlines have in common, 0–1).
Wildcard cosine vs seed	< 0.95	The wildcard's own ceiling—the cohort study values structurally new patterns over ones that are merely different in meaning.
Grader hard rules	100% pass (length: soft over-tolerance)	Named-entity lead, formula, primary keywords, banned patterns—any miss = silent drop. Applies to every option including the wildcard. Length is the one band with give: up to ~11 chars over the 104 target soft-passes with a ⚑ flag; only options past ~115 (or under the 75 floor) drop.
Grader score (standard options)	≥ 85	Active voice, no lead burial, curiosity gap, accuracy. Tolerates one borderline miss, rejects substantive ones.
Formula cap (per call)	≤ 2 per formula	No single headline formula hogs the set. If three or more options share a formula, the lowest-scoring excess are dropped. Wildcards count toward the cap.

The per-headline "receipt" ↑ top

Every headline the tool generates records the exact settings that produced it—think of it as a receipt. That includes which style framing was used, whether the expert lever fired and how, the article-vs-headline-only mode, the active grader thresholds, the models, and the exact rule wording in force at the time. The receipt is content-addressed: identical settings produce an identical short fingerprint, and any change to those settings produces a different one.

Why this matters: when reader-performance data arrives weeks later, the eventual "which style or expert mode won" comparison can be honest, because each headline points back to the precise configuration behind it—rather than guessing which wording was live when it ran. Two headlines with the same fingerprint provably ran under the same settings; two with different fingerprints provably differed.

This is data capture, not a control you operate: it happens automatically on every run and changes nothing about how you use the tool. (Developers: the fingerprint is the spec_hash over a canonicalized config object; see For developers and dev-docs/headline-experimentation/spec.md §5.7.)

Rate limits + budget ↑ top

Usage is capped to keep the shared spend in check. All caps are visible in the quota pills at the top of the page, which turn amber as you approach a limit and red when you hit one. The pills track generations (one per Generate or Regenerate):

10 generations / user / day (RATE_DAILY). Resets at midnight UTC.
200 generations / user / month (RATE_MONTHLY). Resets on the 1st of the month, UTC.
Weekday-only (Monday–Friday, Eastern time). The tool is closed on weekends for capped users—it doesn't fit the off-hours workflow. It's back online Monday.
$70 / month shared team budget (BUDGET_MONTHLY_USD) in estimated model spend—roughly 700 runs at the current per-call cost. Everyone draws from this one budget; the team pill shows the estimated runs left for the whole team. A Slack alert fires at 80% (BUDGET_ALERT_FRACTION), and again at 100%.
Unlimited-users exception (UNLIMITED_USER_EMAILS)—listed people (Pierce is on this list) bypass the per-user daily/monthly caps and the weekend gate. Their usage still counts and shows in the pills for transparency; they're just never blocked.

When you hit a cap the tool returns the relevant code—ERR_RATE_DAILY, ERR_RATE_MONTHLY, ERR_WEEKEND, or ERR_BUDGET—with a plain-language recovery message. Counting is atomic: concurrent runs can't over-shoot the limits, and a run that fails because its background job was killed is automatically refunded against your daily/monthly count (the team budget isn't refunded, since the model calls genuinely happened).

Your full history is one click away—My usage log shows a per-day breakdown and your recent generations, read straight from the server for your authenticated email only.

What gets stored ↑ top

The tool persists three layers, so the long-term study and the honest A/B comparison have the data they need:

Every generation call—an audit row (cost, rejection counts, regen passes, calibration version, prompt hashes, model assignment, the seed fingerprint, and the receipt fingerprint). Powers cost monitoring, rejection-rate metrics, and cross-run kept-state hydration.
Every option offered (standard + wildcard, picked or not)—stored with its full structural fingerprint, the receipt fingerprint, and the gate that dropped it if it failed one. This is the feedstock for the cohort study.
Every option you Keep—stored with the cohort flag, the full grader breakdown, and the fingerprint. Unkeeping soft-deletes (it preserves the decision history rather than erasing it). Kept options are later joined to published-article outcomes for performance analysis.

Anonymization ↑ top

Colleague names never reach the model and never leave the tool. The denylist is loaded at request time from a Cloudflare Secret (DENYLIST_JSON)—never bundled into the build, never committed to git. Your inputs (headline, keywords, article, also-avoid, expert cite) are scrubbed before being sent to the model, and outputs are scrubbed again before being returned. Pierce is the only permitted name. If a generated option contains a denylist name, it's dropped from the pool—the same handling as a hard-rule grader failure. A name appearing in the article body also changes the receipt fingerprint, so a mid-test denylist rotation can't silently merge two different-configuration runs.

For developers ↑ top

JS ↔ Python grader parity contract

The Variant Worker ports generate_grader.py's rule-based criteria to JS at workers/variant/src/grader.js. The port is guarded by tests/test_grader_js_parity.py, which runs a fixture of synthetic headlines through both implementations and asserts identical pass/fail per criterion; any drift fails CI. The formula taxonomy (FORMULA_TAXONOMY in grader.js) mirrors formula_taxonomy.py, shared with the Findings classifier and the daily grader.

Two parity caveats specific to the variant tool:

Char-range override. The grader default (CHAR_DEFAULT = 90–104, the "TH Headline Analysis 2026-06-02" sweet spot) is what both the daily grader and the parity test grade against. The variant tool passes its own charRangeOverride at runtime, realigned (2026-06-05) to the same 90–104 target so the live tool, the daily grader, and the parity harness agree on the band. On top of that the variant tool adds an 11-char editorial over-tolerance (VARIANT_CHAR_RANGE.over_tol, 2026-06-16): 105–115-char options soft-pass carrying an amber ⚑ flag rather than dropping. This is variant-tool-only—the daily grader and the parity harness carry no over_tol, so both still hard-fail over target. The Python grader has no override hook—it's pinned at 90–104—so neither the band nor the over-tolerance can be mirrored to Python.
LLM-judged criteria are not in the parity file. active_voice, no_lead_burial, curiosity, and accurate are evaluated by a batched model call in llm.js, not by grader.js. The expert_signal criterion is a scored, parity-locked rule (weight 2, deterministic regex, N/A when no expert is flagged or present); the framing-conformance check (validateFraming) is JS-only and deliberately never enters score().

Worker endpoints

POST  /generate-async           —enqueue a generation job; returns a job_id in <1s (the client polls)
POST  /generate                 —synchronous full pipeline (auth + rate-limit; atomic slot reservation)
POST  /regenerate               —alias for /generate; client passes a fresh nonce
GET   /job/:id                  —poll an async job (stage + terminal status complete | failed)
POST  /select                   —Keep (idempotent); clears unkept_at on re-keep; backfills offered+fingerprint
POST  /unselect                 —soft-delete a kept variant (cross-generation via seed_fingerprint; email-scoped)
POST  /outcome                  —link a kept selection → published article (rejects soft-deleted selections)
GET   /healthz                  —DB ping + binding check (unauthenticated)
GET   /healthz/deep             —additionally pings Workers AI for a live latency check (unauthenticated)
GET   /seed/:fingerprint/kept   —list active kept variants for a seed (cross-generation hydration)
GET   /metrics/quota-mine       —caller's own usage + team capacity (auth required)
GET   /metrics/my-history?days= —caller's own per-day call log + recent generations
GET   /metrics/quota-by-user    —admin: per-user usage today + month (token-gated)
GET   /metrics/cohort-counts    —selections by cohort (lifetime + currently-active)
GET   /metrics/diversity-rejection-rate?since=
GET   /export/offered-candidates?since=   —canonical export (rule-following + wildcard, every gate)
GET   /export/wildcards-offered?since=     —legacy wildcard-only export
GET   /export/selections?since=&active_only=&include_emails=&token=  —kept decisions (token reveals email)

Generation pipeline (in order)

/generate-async enqueues a Cloudflare Queue job; the consumer runs the pipeline in a dedicated invocation with a full runtime budget (the durable fix for the prior background-timeout class). A job that fails to ack goes to a dead-letter queue and is terminalized + quota-refunded—never re-run, so no double-charge. The pipeline stages the client surfaces as a progress label: queued → generating → scoring & grading → filtering for diversity → regenerating to fix gaps → saving → done.

Model assignments

Set in wrangler.toml by prefix: names starting with @cf/ route to Cloudflare Workers AI (free); anything else routes to Anthropic (paid, billed against the team budget). The current deploy uses Claude Sonnet for rule-following generation and the wildcard, and Claude Haiku for grading. Embeddings (the cosine similarity) use Workers AI bge-large-en-v1.5 at the edge—no third-party round-trip.

The config-spec receipt (developer view)

The receipt is the spec_hash: a SHA-256 over a canonicalized JSON config object (canonicalJSONStringify in util.js recursively sorts keys so key order can't change the hash). It captures the headline type, expert delivery mode, article-mode, the active grader thresholds, the verbatim volatile rule-set content, the models, the worker build SHA, and a content-hash of the active denylist. It's written INSERT OR IGNORE to variant_config_specs and stamped on both variant_generations and each variant_offered_candidates row, so any single headline reconstructs its exact config without a join. Full field-by-field rationale: dev-docs/headline-experimentation/spec.md §5.7.

Refinement loop

scripts/analyze_variant_cohort.py reads /export/selections + /export/offered-candidates and emits docs/data/variant-refinement.json (programmatic findings) and docs/variant/refinement-report.html (operator summary with candidate new formats). Significance gates before a pattern is surfaced: ≥30 wildcards, ≥100 rule-following selections, ≥5 wildcards per pattern cluster, and a bootstrap confidence interval on the PV ratio that excludes 1.0.

Source

Worker: workers/variant/ (src/index.js pipeline · src/grader.js criteria · src/llm.js prompts · src/util.js rate limits / cosine / jaccard / scrub · wrangler.toml caps + models)
Page: docs/variant/ + docs/css/variant.css + docs/js/variant.js
Design: dev-docs/headline-experimentation/spec.md (style test · expert lever · the receipt)
Refinement: scripts/analyze_variant_cohort.py · Parity test: tests/test_grader_js_parity.py

See also the comprehensive Part IV—Variant tool section of the main docs site (architecture · D1 schema · cohort study · troubleshooting · glossary).