Headlines Analytics Console

Variant Tool—Operator Documentation

Last updated 2026-06-07 · Source: workers/variant/ + docs/variant/

The Variant Tool (/variant/) takes a headline you already wrote and hands back 3–10 (typically 5–10 after quality filtering) publication-ready alternative phrasings of the same article, plus one deliberately off-pattern "wildcard." You keep the ones you like and discard the rest. Every standard option is pre-checked against the same automated grader the Grader page runs, so anything offered already clears the quality bar. The tool can also run two opt-in levers: a headline style test (how the headline frames a trend) and an expert lever (surfacing a real expert or study the article names). Both are off by default; if you don't touch them, the tool behaves exactly as it always has.

This page is the full operator reference. For shared concepts the whole console uses—page-views, lift, vertical, the formula taxonomy, the grader's criteria—see the main docs site. For the technical architecture + D1 schema behind this tool, see Part IV—Variant tool of the main docs. Plain-language definitions here match the in-tool ? help bubbles word-for-word.

Quick start ↑ top

  1. Open /variant/.
  2. Paste the seed headline you want alternatives for. This is the version you wrote; the tool rephrases the same article, it doesn't pitch a different or more ambitious one. (20–500 characters.)
  3. List your primary keywords, comma-separated (e.g. vagus nerve, anxiety). Every option the tool offers will contain all of them. Capitalization doesn't matter.
  4. Optional: add secondary keywords (comma-separated). These earn bonus credit when a variant happens to include them, but aren't required.
  5. Optional but recommended: paste the full article body. This is the source of truth the grader checks each variant against, and pasting it unlocks the most—and the most varied—options. (See article text below.)
  6. Optional levers: tick the expert box (or describe one in the cite field) if the article cites one; pick a Headline type if you're running the style test. Both are covered in detail below.
  7. Optional, under Advanced: set Format / Persona / Platform / Outlet to grade against a specific editorial rule set, and paste any prior headlines into Also avoid so the new options route around them.
  8. Click Generate variants and wait roughly 5–15 seconds (a progress bar shows the stage).
  9. Review the cards. Each shows the headline, its score, how different it is from your seed, its length, the grader breakdown (collapsed), and Copy + Keep buttons.
  10. Click Keep on the option(s) you'd ship. Multi-pick is supported; the button toggles to Kept ✓ (click again to unkeep). Your picks are saved server-side and survive regenerating the same seed.
  11. Don't love the batch? Click ↻ Regenerate for a fresh set with the same inputs. (Each generation counts once against your quota.)

The form fields ↑ top

Top to bottom, here is every input and exactly what it does. Required fields are marked; the Generate button stays disabled until the seed headline (≥20 chars) and at least one primary keyword are filled.

FieldRequired?What it does
Seed headline Yes The headline you wrote and want alternatives for. The tool offers rephrasings of the same article. 20–500 characters.
Primary keywords Yes The must-have SEO terms. Every option contains all of them (a firm editorial rule). Comma-separate. If requiring a term would box the options in too tightly, move it to Secondary instead.
Secondary keywords No Related terms that earn bonus credit when a variant naturally includes them, but are never required—so they don't constrain the wording.
Article text No (recommended) Paste the full article. This switches the tool into "maximize difference" mode (covered below): options are pushed further apart from each other, and each is fact-checked against the article instead of just your headline. With the article, options can name a specific technique, finding, expert, or comparison that's in the article but not in your headline. Up to 50,000 characters.
Expert lever (checkbox + "which expert" field) No Surfaces a real expert or study the article names—the single biggest proven boost to reads. See The expert lever.
Also avoid (Advanced) No Paste prior headlines you've already published, one per line. The new options are pushed to be distinct from all of them—handy for sibling articles in the same cluster.
Format (Advanced) No If the article follows a finalized content format (e.g. Everything to Know), pick it so the tool grades against that format's headline rule instead of the general one. Leave on general otherwise.
Persona (Advanced) No The reader to write for: Discover Browser (someone casually scrolling Google Discover who has to be hooked) or Curious Optimizer (TH) (an already-engaged Trend Hunter reader). Leave on none if neither fits.
Platform (Advanced) No Where the headline runs, if the destination has its own length rule: Apple News (90–120 chars), SmartNews (70–90), Trend Hunter B2C (90–104), MSN, Google News. Leave on general (about 90–104) for a standard headline.
Outlet (Advanced) No The specific publication, if it has its own length rule. US Weekly, Woman's World, and Life & Style use a tighter 90–100. Leave on the default (90–104) for everything else.
Headline type (TEST) No The style test: how the headline frames where a trend is in its life. None = a normal headline, nothing changes. See The headline style test.

About the dropdowns: only finalized formats, personas, platforms, and outlets are listed. If yours isn't there, leave the dropdown on its default—the general H1 rules apply (about 90–104 characters, no platform-specific constraint).

The headline style test ↑ top

The Headline type dropdown (marked TEST) runs an experiment: it writes your headline in one of three framings, depending on where a trend sits in its life. The goal is to learn, from real reader behavior over time, which framing performs best. It changes the headline wording only—your article body is untouched.

Leaving it on "None" gives you today's tool, unchanged. The style test is entirely opt-in.

The three framings

TypeThe angleHow the headline sounds
Established
CTRL
The settled, authority-backed take. The trend is recognized; you're explaining it. This is also the experimental baseline. "…Explained" · "What the Research Says…" · "Scientists Confirm…"
Emerging
EMG
The trend is just catching on; the reader is ahead of the curve. Don't overstate it. "Just Starting to…" · "Is Growing" · "Why Experts Are Increasingly…"
Forecast
FWD
Look ahead: where this is headed next. "What Comes Next…" · "Where [X] Is Headed" · "What This Means for You"

The one rule that keeps the test fair

Within a cluster—the set of articles you write for one trend—give each article a different style. Never repeat a style inside a cluster. If a trend gets three articles, make one Established, one Emerging, one Forecast. That balance is what lets the data tell whether a style drove a result rather than just the topic. This rule lives in your workflow, not in the tool—the tool generates one type per run and has no view of the rest of the cluster.

How you confirm a variant carries the style

When you've picked a Headline type, each result card adds a small framing chip so you can see, at a glance, whether the option actually carries the wording you asked for:

The framing chip is a labeling tag, not a quality flag—it never changes the score. The style under test is an open hypothesis, so the tool refuses to let an unproven framing pull down a headline's grade or block it from the pool. The tag just records whether the option carried the chosen wording, so your "which style won" comparison later is clean—an "off-pattern" option is a perfectly good headline, just not a clean data point for that arm. (The expert lever below is different—it is scored, because it's already proven.)

The expert lever ↑ top

Surfacing a real, credentialed expert or a specific study in the headline is the single biggest proven lever for reads—the June 2026 headline analysis measured it as the highest-impact finding by a wide margin. The expert lever turns that finding on for any generation. It sits just under the article body, because the expert has to come from the article.

It has two controls, which work together:

How the two controls interact

Typing a name in the cite field turns the lever on by itself—you don't also need to tick the box. The tool resolves the controls by precedence: a named cite wins (it's the most specific instruction), then the checkbox, then an always-on background scan of the article for expert/study signals. So the cite field takes precedence over the checkbox: name an expert and the lever fires regardless of the box's state.

The hard guarantee: it never invents an expert

The expert or study must actually appear in the article body you pasted. The tool will never invent one. If your article names no expert and states no study or statistic, you simply get a normal headline—no fabricated credential, name, study, year, or percentage. This is enforced in the generation prompt itself (a body-only anti-fabrication rule) and is treated as a hard failure if violated, the same anti-invention floor the tool already applies to every factual claim.

Because the expert lever is a proven result rather than an open hypothesis, it is scored and enforced: when you flag an expert and the article supports one, a variant that fails to surface it is penalized by the grader (it carries real weight in the score). That's the deliberate opposite of the style test's framing chip, which is never scored. Both still share the same floor: the tool can only surface what the article actually contains.

Reading the result cards ↑ top

Each option comes back as a card: the headline, a row of small chips, a collapsible grader breakdown, and Copy + Keep buttons. An always-visible legend at the top of the results repeats these definitions, and every chip also carries the same plain explanation on hover or long-press. Here is the full set.

The chips on each card

ChipLooks likeWhat it means
score score 92 Quality score, 0–100, from the same automated grader the Grader page uses. Higher is better. Every standard (non-wildcard) option shown scored 85 or above—the tool already dropped anything lower—so use this to rank the options, not to decide pass/fail (they all passed).
cosine (vs your headline) cosine 0.62 vs your headline How close this option is in meaning to the headline you pasted. Cosine = meaning-similarity on a 0.00–1.00 scale: 0.00 = totally different, 1.00 = basically identical. Lower means a more genuinely different rewrite. Near-duplicates were already filtered out.
cosine (vs avoid-list) cosine 0.41 vs avoid-list Only appears if you used Also avoid or the tool auto-avoided prior picks from the same cluster. It's the meaning-similarity to the closest headline on that avoid-list, so you can confirm this option doesn't collide with something you've already published. Lower is safer.
chars 98 chars The option's exact character count, spaces included. The general target band is about 90–104. Options that run slightly over (up to ~115) now come through carrying an amber flag instead of being dropped—trim or ship at your judgment. A chosen Platform or Outlet narrows the band (e.g. Apple News 90–120).
⚑ flag ⚑ 6 over target An editorial-judgment flag: the option cleared every hard check but trips a relaxable preference you should weigh—today, running a few characters over the 104 target (within ~11). The tool surfaces it so you decide whether to trim or keep it. It's never a defect and never lowered the score. More relaxable checks may flag here over time.
wildcard ⚡ wildcard The one experimental, deliberately off-pattern option per batch (only the card tagged ⚡). It's checked for hard rules and accuracy but skips the style judgments the others pass, so its score isn't comparable to theirs. Read it as a creative prompt and verify it editorially before publishing.
framing ✓ / ⓘ ✓ FWD framing   ⓘ FWD: off-pattern Only appears when you picked a Headline type. Green ✓ = the option carries the trend-stage wording you chose; ⓘ off-pattern = it doesn't use that typical wording, so it's tagged for the rolling framing study. Labeling tag, not a quality flag—it never changes the score, and an off-pattern option is still a fine headline. (See the style test.)
accuracy (wildcards only) accuracy: article-supported The accuracy safety-net note for a wildcard: the grader's short reason for judging it factually consistent with your headline (and the article, if you pasted one). Wildcards skip the style checks, so this is the one substantive check they get.

The grader breakdown

Click the grader breakdown line on any card to expand the per-rule detail behind its score—the same line-by-line readout the Grader page shows:

Rules cover length, named-entity lead, formula, keyword presence, active voice, no lead burial, curiosity gap, factual accuracy, the punctuation ban, and—when you flagged one—the expert/study signal. Rules that don't apply (e.g. an Apple News title length when no Apple News title was supplied, or the expert check when no expert was flagged) show · and drop out of the score entirely; they neither help nor hurt.

The run-stats row

The small grey row under the "Variants" heading is diagnostics—you don't need to act on it. It reports what happened during the run:

What the score floor guarantees ↑ top

Every standard option offered for selection has already cleared two bars, so you should never see a low-quality option:

Candidates that fail either bar drop out, and the pipeline regenerates with explicit failure feedback for up to two extra passes to refill the set. If it still can't reach your requested count, you get fewer options plus a "what to try" panel and a one-click Regenerate—not filler.

One exception: the ⚡ wildcard is intentionally allowed to skip the style judgments (see below), so its score isn't comparable. It still passes every hard structural rule plus an accuracy check.

The wildcard slot ↑ top

Every generation returns exactly one option tagged ⚡ wildcard. It's written under a different prompt that's encouraged to depart structurally from a typical headline for your format and persona. It must still pass every hard rule (length, named-entity lead, no banned patterns) plus a single accuracy safety-net check—but it intentionally bypasses the rule-following style judgments (active voice, lead burial, curiosity), because those would fight its job of being structurally novel.

Verify wildcards editorially before publishing. They're off-pattern by design—read each as a creative prompt, not a vetted option.

The wildcard isn't only for you. Every wildcard offered—kept or not—feeds a long-term study aimed at discovering new high-performing headline formats. scripts/analyze_variant_cohort.py periodically scans the accumulated wildcards, clusters them by structural fingerprint (formula × leading part-of-speech × part-of-speech sequence (the grammatical shape of how the words are arranged)), and surfaces any pattern whose pick-rate or post-publish performance materially beats the rule-following baseline. Patterns that clear the significance bars become candidates for promotion into the standard format library.

Why no variant-vs-variant diversity gates ↑ top

The options are alternatives for you to pick from, not a set published together. You keep one (or a few) and discard the rest, so two options sharing a leading word, a formula, or some content tokens is fine—they're competing, not co-published. This is a key way the Variant Tool differs from the routine grader of live headlines, which evaluates published output against editorial diversity standards. Different setting, different rules.

There's one nuance. When you paste the article body, the tool switches into "maximize difference" mode and does push the standard options apart from each other—because the article gives the model real, fact-supported latitude (different facets, mechanisms, audiences, comparisons) to diverge legitimately. Without an article, no pairwise diversity gate applies between options.

Concerns about cross-publication similarity—sibling headlines in one cluster looking alike on the live site—are handled by the Also avoid input plus automatic cluster auto-avoid: prior picks from the same cluster are routed around via cosine. Here are the gates that are always active:

GateThresholdWhat it catches
Cosine vs your seed headline< 0.97  (0.93 in article-mode)Near-literal duplicates of the source. Tighter in article-mode, where the article gives options room to legitimately diverge.
Cosine vs each "also avoid" entry< 0.97Collisions with headlines you've already published from the same cluster.
Pairwise cosine + word-overlap between standard optionscosine < 0.85, jaccard < 0.45  (article-mode only)Forces facet diversity across the set. Jaccard = word-overlap (the share of distinct words two headlines have in common, 0–1).
Wildcard cosine vs seed< 0.95The wildcard's own ceiling—the cohort study values structurally new patterns over ones that are merely different in meaning.
Grader hard rules100% pass (length: soft over-tolerance)Named-entity lead, formula, primary keywords, banned patterns—any miss = silent drop. Applies to every option including the wildcard. Length is the one band with give: up to ~11 chars over the 104 target soft-passes with a ⚑ flag; only options past ~115 (or under the 75 floor) drop.
Grader score (standard options)≥ 85Active voice, no lead burial, curiosity gap, accuracy. Tolerates one borderline miss, rejects substantive ones.
Formula cap (per call)≤ 2 per formulaNo single headline formula hogs the set. If three or more options share a formula, the lowest-scoring excess are dropped. Wildcards count toward the cap.

The per-headline "receipt" ↑ top

Every headline the tool generates records the exact settings that produced it—think of it as a receipt. That includes which style framing was used, whether the expert lever fired and how, the article-vs-headline-only mode, the active grader thresholds, the models, and the exact rule wording in force at the time. The receipt is content-addressed: identical settings produce an identical short fingerprint, and any change to those settings produces a different one.

Why this matters: when reader-performance data arrives weeks later, the eventual "which style or expert mode won" comparison can be honest, because each headline points back to the precise configuration behind it—rather than guessing which wording was live when it ran. Two headlines with the same fingerprint provably ran under the same settings; two with different fingerprints provably differed.

This is data capture, not a control you operate: it happens automatically on every run and changes nothing about how you use the tool. (Developers: the fingerprint is the spec_hash over a canonicalized config object; see For developers and dev-docs/headline-experimentation/spec.md §5.7.)

Rate limits + budget ↑ top

Usage is capped to keep the shared spend in check. All caps are visible in the quota pills at the top of the page, which turn amber as you approach a limit and red when you hit one. The pills track generations (one per Generate or Regenerate):

When you hit a cap the tool returns the relevant code—ERR_RATE_DAILY, ERR_RATE_MONTHLY, ERR_WEEKEND, or ERR_BUDGET—with a plain-language recovery message. Counting is atomic: concurrent runs can't over-shoot the limits, and a run that fails because its background job was killed is automatically refunded against your daily/monthly count (the team budget isn't refunded, since the model calls genuinely happened).

Your full history is one click away—My usage log shows a per-day breakdown and your recent generations, read straight from the server for your authenticated email only.

What gets stored ↑ top

The tool persists three layers, so the long-term study and the honest A/B comparison have the data they need:

Anonymization ↑ top

Colleague names never reach the model and never leave the tool. The denylist is loaded at request time from a Cloudflare Secret (DENYLIST_JSON)—never bundled into the build, never committed to git. Your inputs (headline, keywords, article, also-avoid, expert cite) are scrubbed before being sent to the model, and outputs are scrubbed again before being returned. Pierce is the only permitted name. If a generated option contains a denylist name, it's dropped from the pool—the same handling as a hard-rule grader failure. A name appearing in the article body also changes the receipt fingerprint, so a mid-test denylist rotation can't silently merge two different-configuration runs.

For developers ↑ top

JS ↔ Python grader parity contract

The Variant Worker ports generate_grader.py's rule-based criteria to JS at workers/variant/src/grader.js. The port is guarded by tests/test_grader_js_parity.py, which runs a fixture of synthetic headlines through both implementations and asserts identical pass/fail per criterion; any drift fails CI. The formula taxonomy (FORMULA_TAXONOMY in grader.js) mirrors formula_taxonomy.py, shared with the Findings classifier and the daily grader.

Two parity caveats specific to the variant tool:

Worker endpoints

POST  /generate-async           —enqueue a generation job; returns a job_id in <1s (the client polls)
POST  /generate                 —synchronous full pipeline (auth + rate-limit; atomic slot reservation)
POST  /regenerate               —alias for /generate; client passes a fresh nonce
GET   /job/:id                  —poll an async job (stage + terminal status complete | failed)
POST  /select                   —Keep (idempotent); clears unkept_at on re-keep; backfills offered+fingerprint
POST  /unselect                 —soft-delete a kept variant (cross-generation via seed_fingerprint; email-scoped)
POST  /outcome                  —link a kept selection → published article (rejects soft-deleted selections)
GET   /healthz                  —DB ping + binding check (unauthenticated)
GET   /healthz/deep             —additionally pings Workers AI for a live latency check (unauthenticated)
GET   /seed/:fingerprint/kept   —list active kept variants for a seed (cross-generation hydration)
GET   /metrics/quota-mine       —caller's own usage + team capacity (auth required)
GET   /metrics/my-history?days= —caller's own per-day call log + recent generations
GET   /metrics/quota-by-user    —admin: per-user usage today + month (token-gated)
GET   /metrics/cohort-counts    —selections by cohort (lifetime + currently-active)
GET   /metrics/diversity-rejection-rate?since=
GET   /export/offered-candidates?since=   —canonical export (rule-following + wildcard, every gate)
GET   /export/wildcards-offered?since=     —legacy wildcard-only export
GET   /export/selections?since=&active_only=&include_emails=&token=  —kept decisions (token reveals email)

Generation pipeline (in order)

/generate-async enqueues a Cloudflare Queue job; the consumer runs the pipeline in a dedicated invocation with a full runtime budget (the durable fix for the prior background-timeout class). A job that fails to ack goes to a dead-letter queue and is terminalized + quota-refunded—never re-run, so no double-charge. The pipeline stages the client surfaces as a progress label: queued → generating → scoring & grading → filtering for diversity → regenerating to fix gaps → saving → done.

Model assignments

Set in wrangler.toml by prefix: names starting with @cf/ route to Cloudflare Workers AI (free); anything else routes to Anthropic (paid, billed against the team budget). The current deploy uses Claude Sonnet for rule-following generation and the wildcard, and Claude Haiku for grading. Embeddings (the cosine similarity) use Workers AI bge-large-en-v1.5 at the edge—no third-party round-trip.

The config-spec receipt (developer view)

The receipt is the spec_hash: a SHA-256 over a canonicalized JSON config object (canonicalJSONStringify in util.js recursively sorts keys so key order can't change the hash). It captures the headline type, expert delivery mode, article-mode, the active grader thresholds, the verbatim volatile rule-set content, the models, the worker build SHA, and a content-hash of the active denylist. It's written INSERT OR IGNORE to variant_config_specs and stamped on both variant_generations and each variant_offered_candidates row, so any single headline reconstructs its exact config without a join. Full field-by-field rationale: dev-docs/headline-experimentation/spec.md §5.7.

Refinement loop

scripts/analyze_variant_cohort.py reads /export/selections + /export/offered-candidates and emits docs/data/variant-refinement.json (programmatic findings) and docs/variant/refinement-report.html (operator summary with candidate new formats). Significance gates before a pattern is surfaced: ≥30 wildcards, ≥100 rule-following selections, ≥5 wildcards per pattern cluster, and a bootstrap confidence interval on the PV ratio that excludes 1.0.

Source

See also the comprehensive Part IV—Variant tool section of the main docs site (architecture · D1 schema · cohort study · troubleshooting · glossary).