The Variant Tool (/variant/) takes a headline you already wrote and hands back 3–10 (typically 5–10 after quality filtering) publication-ready alternative phrasings of the same article, plus one deliberately off-pattern "wildcard." You keep the ones you like and discard the rest. Every standard option is pre-checked against the same automated grader the Grader page runs, so anything offered already clears the quality bar. The tool can also run two opt-in levers: a headline style test (how the headline frames a trend) and an expert lever (surfacing a real expert or study the article names). Both are off by default; if you don't touch them, the tool behaves exactly as it always has.
This page is the full operator reference. For shared concepts the whole console uses—page-views, lift, vertical, the formula taxonomy, the grader's criteria—see the main docs site. For the technical architecture + D1 schema behind this tool, see Part IV—Variant tool of the main docs. Plain-language definitions here match the in-tool ? help bubbles word-for-word.
/variant/.vagus nerve, anxiety). Every option the tool offers will contain all of them. Capitalization doesn't matter.Top to bottom, here is every input and exactly what it does. Required fields are marked; the Generate button stays disabled until the seed headline (≥20 chars) and at least one primary keyword are filled.
| Field | Required? | What it does |
|---|---|---|
| Seed headline | Yes | The headline you wrote and want alternatives for. The tool offers rephrasings of the same article. 20–500 characters. |
| Primary keywords | Yes | The must-have SEO terms. Every option contains all of them (a firm editorial rule). Comma-separate. If requiring a term would box the options in too tightly, move it to Secondary instead. |
| Secondary keywords | No | Related terms that earn bonus credit when a variant naturally includes them, but are never required—so they don't constrain the wording. |
| Article text | No (recommended) | Paste the full article. This switches the tool into "maximize difference" mode (covered below): options are pushed further apart from each other, and each is fact-checked against the article instead of just your headline. With the article, options can name a specific technique, finding, expert, or comparison that's in the article but not in your headline. Up to 50,000 characters. |
| Expert lever (checkbox + "which expert" field) | No | Surfaces a real expert or study the article names—the single biggest proven boost to reads. See The expert lever. |
| Also avoid (Advanced) | No | Paste prior headlines you've already published, one per line. The new options are pushed to be distinct from all of them—handy for sibling articles in the same cluster. |
| Format (Advanced) | No | If the article follows a finalized content format (e.g. Everything to Know), pick it so the tool grades against that format's headline rule instead of the general one. Leave on general otherwise. |
| Persona (Advanced) | No | The reader to write for: Discover Browser (someone casually scrolling Google Discover who has to be hooked) or Curious Optimizer (TH) (an already-engaged Trend Hunter reader). Leave on none if neither fits. |
| Platform (Advanced) | No | Where the headline runs, if the destination has its own length rule: Apple News (90–120 chars), SmartNews (70–90), Trend Hunter B2C (90–104), MSN, Google News. Leave on general (about 90–104) for a standard headline. |
| Outlet (Advanced) | No | The specific publication, if it has its own length rule. US Weekly, Woman's World, and Life & Style use a tighter 90–100. Leave on the default (90–104) for everything else. |
| Headline type (TEST) | No | The style test: how the headline frames where a trend is in its life. None = a normal headline, nothing changes. See The headline style test. |
About the dropdowns: only finalized formats, personas, platforms, and outlets are listed. If yours isn't there, leave the dropdown on its default—the general H1 rules apply (about 90–104 characters, no platform-specific constraint).
The Headline type dropdown (marked TEST) runs an experiment: it writes your headline in one of three framings, depending on where a trend sits in its life. The goal is to learn, from real reader behavior over time, which framing performs best. It changes the headline wording only—your article body is untouched.
Leaving it on "None" gives you today's tool, unchanged. The style test is entirely opt-in.
| Type | The angle | How the headline sounds |
|---|---|---|
EstablishedCTRL |
The settled, authority-backed take. The trend is recognized; you're explaining it. This is also the experimental baseline. | "…Explained" · "What the Research Says…" · "Scientists Confirm…" |
EmergingEMG |
The trend is just catching on; the reader is ahead of the curve. Don't overstate it. | "Just Starting to…" · "Is Growing" · "Why Experts Are Increasingly…" |
ForecastFWD |
Look ahead: where this is headed next. | "What Comes Next…" · "Where [X] Is Headed" · "What This Means for You" |
Within a cluster—the set of articles you write for one trend—give each article a different style. Never repeat a style inside a cluster. If a trend gets three articles, make one Established, one Emerging, one Forecast. That balance is what lets the data tell whether a style drove a result rather than just the topic. This rule lives in your workflow, not in the tool—the tool generates one type per run and has no view of the rest of the cluster.
When you've picked a Headline type, each result card adds a small framing chip so you can see, at a glance, whether the option actually carries the wording you asked for:
The framing chip is a labeling tag, not a quality flag—it never changes the score. The style under test is an open hypothesis, so the tool refuses to let an unproven framing pull down a headline's grade or block it from the pool. The tag just records whether the option carried the chosen wording, so your "which style won" comparison later is clean—an "off-pattern" option is a perfectly good headline, just not a clean data point for that arm. (The expert lever below is different—it is scored, because it's already proven.)
Surfacing a real, credentialed expert or a specific study in the headline is the single biggest proven lever for reads—the June 2026 headline analysis measured it as the highest-impact finding by a wide margin. The expert lever turns that finding on for any generation. It sits just under the article body, because the expert has to come from the article.
It has two controls, which work together:
Typing a name in the cite field turns the lever on by itself—you don't also need to tick the box. The tool resolves the controls by precedence: a named cite wins (it's the most specific instruction), then the checkbox, then an always-on background scan of the article for expert/study signals. So the cite field takes precedence over the checkbox: name an expert and the lever fires regardless of the box's state.
The expert or study must actually appear in the article body you pasted. The tool will never invent one. If your article names no expert and states no study or statistic, you simply get a normal headline—no fabricated credential, name, study, year, or percentage. This is enforced in the generation prompt itself (a body-only anti-fabrication rule) and is treated as a hard failure if violated, the same anti-invention floor the tool already applies to every factual claim.
Because the expert lever is a proven result rather than an open hypothesis, it is scored and enforced: when you flag an expert and the article supports one, a variant that fails to surface it is penalized by the grader (it carries real weight in the score). That's the deliberate opposite of the style test's framing chip, which is never scored. Both still share the same floor: the tool can only surface what the article actually contains.
Each option comes back as a card: the headline, a row of small chips, a collapsible grader breakdown, and Copy + Keep buttons. An always-visible legend at the top of the results repeats these definitions, and every chip also carries the same plain explanation on hover or long-press. Here is the full set.
| Chip | Looks like | What it means |
|---|---|---|
| score | score 92 | Quality score, 0–100, from the same automated grader the Grader page uses. Higher is better. Every standard (non-wildcard) option shown scored 85 or above—the tool already dropped anything lower—so use this to rank the options, not to decide pass/fail (they all passed). |
| cosine (vs your headline) | cosine 0.62 vs your headline | How close this option is in meaning to the headline you pasted. Cosine = meaning-similarity on a 0.00–1.00 scale: 0.00 = totally different, 1.00 = basically identical. Lower means a more genuinely different rewrite. Near-duplicates were already filtered out. |
| cosine (vs avoid-list) | cosine 0.41 vs avoid-list | Only appears if you used Also avoid or the tool auto-avoided prior picks from the same cluster. It's the meaning-similarity to the closest headline on that avoid-list, so you can confirm this option doesn't collide with something you've already published. Lower is safer. |
| chars | 98 chars | The option's exact character count, spaces included. The general target band is about 90–104. Options that run slightly over (up to ~115) now come through carrying an amber ⚑ flag instead of being dropped—trim or ship at your judgment. A chosen Platform or Outlet narrows the band (e.g. Apple News 90–120). |
| ⚑ flag | ⚑ 6 over target | An editorial-judgment flag: the option cleared every hard check but trips a relaxable preference you should weigh—today, running a few characters over the 104 target (within ~11). The tool surfaces it so you decide whether to trim or keep it. It's never a defect and never lowered the score. More relaxable checks may flag here over time. |
| wildcard | ⚡ wildcard | The one experimental, deliberately off-pattern option per batch (only the card tagged ⚡). It's checked for hard rules and accuracy but skips the style judgments the others pass, so its score isn't comparable to theirs. Read it as a creative prompt and verify it editorially before publishing. |
| framing ✓ / ⓘ | ✓ FWD framing ⓘ FWD: off-pattern | Only appears when you picked a Headline type. Green ✓ = the option carries the trend-stage wording you chose; ⓘ off-pattern = it doesn't use that typical wording, so it's tagged for the rolling framing study. Labeling tag, not a quality flag—it never changes the score, and an off-pattern option is still a fine headline. (See the style test.) |
| accuracy (wildcards only) | accuracy: article-supported | The accuracy safety-net note for a wildcard: the grader's short reason for judging it factually consistent with your headline (and the article, if you pasted one). Wildcards skip the style checks, so this is the one substantive check they get. |
Click the grader breakdown line on any card to expand the per-rule detail behind its score—the same line-by-line readout the Grader page shows:
Rules cover length, named-entity lead, formula, keyword presence, active voice, no lead burial, curiosity gap, factual accuracy, the punctuation ban, and—when you flagged one—the expert/study signal. Rules that don't apply (e.g. an Apple News title length when no Apple News title was supplied, or the expert check when no expert was flagged) show · and drop out of the score entirely; they neither help nor hurt.
The small grey row under the "Variants" heading is diagnostics—you don't need to act on it. It reports what happened during the run:
Every standard option offered for selection has already cleared two bars, so you should never see a low-quality option:
Candidates that fail either bar drop out, and the pipeline regenerates with explicit failure feedback for up to two extra passes to refill the set. If it still can't reach your requested count, you get fewer options plus a "what to try" panel and a one-click Regenerate—not filler.
One exception: the ⚡ wildcard is intentionally allowed to skip the style judgments (see below), so its score isn't comparable. It still passes every hard structural rule plus an accuracy check.
Every generation returns exactly one option tagged ⚡ wildcard. It's written under a different prompt that's encouraged to depart structurally from a typical headline for your format and persona. It must still pass every hard rule (length, named-entity lead, no banned patterns) plus a single accuracy safety-net check—but it intentionally bypasses the rule-following style judgments (active voice, lead burial, curiosity), because those would fight its job of being structurally novel.
Verify wildcards editorially before publishing. They're off-pattern by design—read each as a creative prompt, not a vetted option.
The wildcard isn't only for you. Every wildcard offered—kept or not—feeds a long-term study aimed at discovering new high-performing headline formats. scripts/analyze_variant_cohort.py periodically scans the accumulated wildcards, clusters them by structural fingerprint (formula × leading part-of-speech × part-of-speech sequence (the grammatical shape of how the words are arranged)), and surfaces any pattern whose pick-rate or post-publish performance materially beats the rule-following baseline. Patterns that clear the significance bars become candidates for promotion into the standard format library.
The options are alternatives for you to pick from, not a set published together. You keep one (or a few) and discard the rest, so two options sharing a leading word, a formula, or some content tokens is fine—they're competing, not co-published. This is a key way the Variant Tool differs from the routine grader of live headlines, which evaluates published output against editorial diversity standards. Different setting, different rules.
There's one nuance. When you paste the article body, the tool switches into "maximize difference" mode and does push the standard options apart from each other—because the article gives the model real, fact-supported latitude (different facets, mechanisms, audiences, comparisons) to diverge legitimately. Without an article, no pairwise diversity gate applies between options.
Concerns about cross-publication similarity—sibling headlines in one cluster looking alike on the live site—are handled by the Also avoid input plus automatic cluster auto-avoid: prior picks from the same cluster are routed around via cosine. Here are the gates that are always active:
| Gate | Threshold | What it catches |
|---|---|---|
| Cosine vs your seed headline | < 0.97 (0.93 in article-mode) | Near-literal duplicates of the source. Tighter in article-mode, where the article gives options room to legitimately diverge. |
| Cosine vs each "also avoid" entry | < 0.97 | Collisions with headlines you've already published from the same cluster. |
| Pairwise cosine + word-overlap between standard options | cosine < 0.85, jaccard < 0.45 (article-mode only) | Forces facet diversity across the set. Jaccard = word-overlap (the share of distinct words two headlines have in common, 0–1). |
| Wildcard cosine vs seed | < 0.95 | The wildcard's own ceiling—the cohort study values structurally new patterns over ones that are merely different in meaning. |
| Grader hard rules | 100% pass (length: soft over-tolerance) | Named-entity lead, formula, primary keywords, banned patterns—any miss = silent drop. Applies to every option including the wildcard. Length is the one band with give: up to ~11 chars over the 104 target soft-passes with a ⚑ flag; only options past ~115 (or under the 75 floor) drop. |
| Grader score (standard options) | ≥ 85 | Active voice, no lead burial, curiosity gap, accuracy. Tolerates one borderline miss, rejects substantive ones. |
| Formula cap (per call) | ≤ 2 per formula | No single headline formula hogs the set. If three or more options share a formula, the lowest-scoring excess are dropped. Wildcards count toward the cap. |
Every headline the tool generates records the exact settings that produced it—think of it as a receipt. That includes which style framing was used, whether the expert lever fired and how, the article-vs-headline-only mode, the active grader thresholds, the models, and the exact rule wording in force at the time. The receipt is content-addressed: identical settings produce an identical short fingerprint, and any change to those settings produces a different one.
Why this matters: when reader-performance data arrives weeks later, the eventual "which style or expert mode won" comparison can be honest, because each headline points back to the precise configuration behind it—rather than guessing which wording was live when it ran. Two headlines with the same fingerprint provably ran under the same settings; two with different fingerprints provably differed.
This is data capture, not a control you operate: it happens automatically on every run and changes nothing about how you use the tool. (Developers: the fingerprint is the spec_hash over a canonicalized config object; see For developers and dev-docs/headline-experimentation/spec.md §5.7.)
Usage is capped to keep the shared spend in check. All caps are visible in the quota pills at the top of the page, which turn amber as you approach a limit and red when you hit one. The pills track generations (one per Generate or Regenerate):
RATE_DAILY). Resets at midnight UTC.RATE_MONTHLY). Resets on the 1st of the month, UTC.BUDGET_MONTHLY_USD) in estimated model spend—roughly 700 runs at the current per-call cost. Everyone draws from this one budget; the team pill shows the estimated runs left for the whole team. A Slack alert fires at 80% (BUDGET_ALERT_FRACTION), and again at 100%.UNLIMITED_USER_EMAILS)—listed people (Pierce is on this list) bypass the per-user daily/monthly caps and the weekend gate. Their usage still counts and shows in the pills for transparency; they're just never blocked.When you hit a cap the tool returns the relevant code—ERR_RATE_DAILY, ERR_RATE_MONTHLY, ERR_WEEKEND, or ERR_BUDGET—with a plain-language recovery message. Counting is atomic: concurrent runs can't over-shoot the limits, and a run that fails because its background job was killed is automatically refunded against your daily/monthly count (the team budget isn't refunded, since the model calls genuinely happened).
Your full history is one click away—My usage log shows a per-day breakdown and your recent generations, read straight from the server for your authenticated email only.
The tool persists three layers, so the long-term study and the honest A/B comparison have the data they need:
Colleague names never reach the model and never leave the tool. The denylist is loaded at request time from a Cloudflare Secret (DENYLIST_JSON)—never bundled into the build, never committed to git. Your inputs (headline, keywords, article, also-avoid, expert cite) are scrubbed before being sent to the model, and outputs are scrubbed again before being returned. Pierce is the only permitted name. If a generated option contains a denylist name, it's dropped from the pool—the same handling as a hard-rule grader failure. A name appearing in the article body also changes the receipt fingerprint, so a mid-test denylist rotation can't silently merge two different-configuration runs.
The Variant Worker ports generate_grader.py's rule-based criteria to JS at workers/variant/src/grader.js. The port is guarded by tests/test_grader_js_parity.py, which runs a fixture of synthetic headlines through both implementations and asserts identical pass/fail per criterion; any drift fails CI. The formula taxonomy (FORMULA_TAXONOMY in grader.js) mirrors formula_taxonomy.py, shared with the Findings classifier and the daily grader.
Two parity caveats specific to the variant tool:
CHAR_DEFAULT = 90–104, the "TH Headline Analysis 2026-06-02" sweet spot) is what both the daily grader and the parity test grade against. The variant tool passes its own charRangeOverride at runtime, realigned (2026-06-05) to the same 90–104 target so the live tool, the daily grader, and the parity harness agree on the band. On top of that the variant tool adds an 11-char editorial over-tolerance (VARIANT_CHAR_RANGE.over_tol, 2026-06-16): 105–115-char options soft-pass carrying an amber ⚑ flag rather than dropping. This is variant-tool-only—the daily grader and the parity harness carry no over_tol, so both still hard-fail over target. The Python grader has no override hook—it's pinned at 90–104—so neither the band nor the over-tolerance can be mirrored to Python.active_voice, no_lead_burial, curiosity, and accurate are evaluated by a batched model call in llm.js, not by grader.js. The expert_signal criterion is a scored, parity-locked rule (weight 2, deterministic regex, N/A when no expert is flagged or present); the framing-conformance check (validateFraming) is JS-only and deliberately never enters score().POST /generate-async —enqueue a generation job; returns a job_id in <1s (the client polls) POST /generate —synchronous full pipeline (auth + rate-limit; atomic slot reservation) POST /regenerate —alias for /generate; client passes a fresh nonce GET /job/:id —poll an async job (stage + terminal status complete | failed) POST /select —Keep (idempotent); clears unkept_at on re-keep; backfills offered+fingerprint POST /unselect —soft-delete a kept variant (cross-generation via seed_fingerprint; email-scoped) POST /outcome —link a kept selection → published article (rejects soft-deleted selections) GET /healthz —DB ping + binding check (unauthenticated) GET /healthz/deep —additionally pings Workers AI for a live latency check (unauthenticated) GET /seed/:fingerprint/kept —list active kept variants for a seed (cross-generation hydration) GET /metrics/quota-mine —caller's own usage + team capacity (auth required) GET /metrics/my-history?days= —caller's own per-day call log + recent generations GET /metrics/quota-by-user —admin: per-user usage today + month (token-gated) GET /metrics/cohort-counts —selections by cohort (lifetime + currently-active) GET /metrics/diversity-rejection-rate?since= GET /export/offered-candidates?since= —canonical export (rule-following + wildcard, every gate) GET /export/wildcards-offered?since= —legacy wildcard-only export GET /export/selections?since=&active_only=&include_emails=&token= —kept decisions (token reveals email)
/generate-async enqueues a Cloudflare Queue job; the consumer runs the pipeline in a dedicated invocation with a full runtime budget (the durable fix for the prior background-timeout class). A job that fails to ack goes to a dead-letter queue and is terminalized + quota-refunded—never re-run, so no double-charge. The pipeline stages the client surfaces as a progress label: queued → generating → scoring & grading → filtering for diversity → regenerating to fix gaps → saving → done.
Set in wrangler.toml by prefix: names starting with @cf/ route to Cloudflare Workers AI (free); anything else routes to Anthropic (paid, billed against the team budget). The current deploy uses Claude Sonnet for rule-following generation and the wildcard, and Claude Haiku for grading. Embeddings (the cosine similarity) use Workers AI bge-large-en-v1.5 at the edge—no third-party round-trip.
The receipt is the spec_hash: a SHA-256 over a canonicalized JSON config object (canonicalJSONStringify in util.js recursively sorts keys so key order can't change the hash). It captures the headline type, expert delivery mode, article-mode, the active grader thresholds, the verbatim volatile rule-set content, the models, the worker build SHA, and a content-hash of the active denylist. It's written INSERT OR IGNORE to variant_config_specs and stamped on both variant_generations and each variant_offered_candidates row, so any single headline reconstructs its exact config without a join. Full field-by-field rationale: dev-docs/headline-experimentation/spec.md §5.7.
scripts/analyze_variant_cohort.py reads /export/selections + /export/offered-candidates and emits docs/data/variant-refinement.json (programmatic findings) and docs/variant/refinement-report.html (operator summary with candidate new formats). Significance gates before a pattern is surfaced: ≥30 wildcards, ≥100 rule-following selections, ≥5 wildcards per pattern cluster, and a bootstrap confidence interval on the PV ratio that excludes 1.0.
workers/variant/ (src/index.js pipeline · src/grader.js criteria · src/llm.js prompts · src/util.js rate limits / cosine / jaccard / scrub · wrangler.toml caps + models)docs/variant/ + docs/css/variant.css + docs/js/variant.jsdev-docs/headline-experimentation/spec.md (style test · expert lever · the receipt)scripts/analyze_variant_cohort.py · Parity test: tests/test_grader_js_parity.py