Scale Engine SOP
17 subscribers
4 photos
1 link
Repeatable frameworks, checklists and SOPs for building thousands of pages from data — templates, guardrails against thin content, and the QA steps that keep you indexed.
Download Telegram
URL pattern spec: lock it before the first page exists

URL changes after launch are the most expensive rollback in pSEO. Write the spec once, freeze it.

The pattern contract:
☐ Step 1 — One variable per path segment. /loan/{type}/{state}, never /loan/{type}-in-{state}. Gate: fail if a segment encodes two dimensions.
☐ Step 2 — Slug source is immutable. Derive from a stable ID, not the display name. When "New-York City" becomes "NYC" in your data, the URL must not move. Gate: fail if slug derives from a mutable field.
☐ Step 3 — Casing and separators fixed: lowercase, hyphen, no trailing slash. Gate: fail any uppercase or underscore.
☐ Step 4 — Reserved-word guard. Strip values that collide with existing routes (/about, /api). Gate: fail on collision.
☐ Step 5 — Max one optional segment, and it must 301 to the canonical short form.

Guardrail: a unit test that generates 1,000 slugs from sample data and asserts zero duplicates and zero reserved-word hits.

Ship gate: don't publish until all boxes are checked.
The internal-link injection SOP (deterministic, not random)

Random "related links" blocks leak crawl budget and link to dead-ends. Make linking a deterministic function of your data graph.

For each generated page, inject links in this fixed order:
— 1 link up to the parent hub (the {category} index). Owner: template. Gate: must exist.
— 3 sibling links to the nearest neighbors on your primary dimension (e.g. adjacent price tiers, same city). Owner: a ranked-neighbor query. Gate: siblings must themselves be indexable.
— 2 cross-dimension links (same {type}, different {region}). Gate: skip any target that is noindex.
— 1 link to the highest-authority page in the cluster (your money page).

Rules that keep it clean:
☐ Never link to a page that links back identically — break reciprocal loops.
☐ Anchor text pulls the target's H1 token, not a generic "click here."
☐ Cap total in-template links at 7 to avoid dilution.
☐ Run a monthly orphan report; any page with under 3 inbound internal links gets force-added to a neighbor's block.

Ship gate: don't publish until all boxes are checked.
Pairs well with this channel

@OverviewHotTake — Strong, unfiltered opinions on AI Overviews and generative search — where it's… Quietly one of the better feeds in the space.
Crawl-budget release SOP for 100k+ page sets

Dumping 100,000 URLs into one sitemap teaches Googlebot nothing about priority. Meter the release.

The rollout schedule:
☐ Step 1 — Split sitemaps by tier, not by alphabet. sitemap-priority.xml (proven-demand pages) ships first and alone.
☐ Step 2 — Cap week-one exposure at the count your log files show Googlebot already crawls daily, times 5. Gate: don't exceed it.
☐ Step 3 — Watch the log-file ratio: indexed URLs ÷ crawled URLs. Gate: hold the next batch until the ratio is above 0.7.
☐ Step 4 — Release subsequent tiers only when the prior tier's index rate stabilizes for 7 days.
☐ Step 5 — Keep a lastmod that is honest. Faking it to trigger recrawl burns trust and crawl budget.

Guardrail: a daily job parses access logs and alerts if crawl requests to thin tiers exceed 15% of bot hits — a sign Google is wasting budget on pages you should have gated.

Ship gate: don't publish until all boxes are checked.
Guardrail: the duplicate-title scanner

At scale, near-duplicate titles are the quiet killer — "Best {x} in {city}" times 5,000 reads as one page to a clustering algorithm. Install a hard guard.

The title-uniqueness routine:
☐ Step 1 — Generate all titles in a dry run, no publish.
☐ Step 2 — Strip the variable tokens, hash the static skeleton. Gate: if 100% of titles share one skeleton with only the city swapped, the template fails. Inject a second varying data point (rating, count, year).
☐ Step 3 — Levenshtein-cluster the full title strings. Gate: fail any cluster where more than 50 titles sit within edit-distance 5 of each other.
☐ Step 4 — Enforce a length band of 50-60 characters AFTER token substitution, using the longest real value, not the average. Gate: fail if the max-length value truncates.
☐ Step 5 — Meta descriptions get the same scan, with a 30-character minimum unique span per page.

Guardrail: this scanner runs in CI on every template change, not just at launch.

Ship gate: don't publish until all boxes are checked.
The promotion ladder: how a page earns the index

Indexing should be earned, not granted at birth. Run every generated page up a ladder.

The rungs (a page sits on the lowest it qualifies for):
— Rung 0 — Rendered, noindex,follow, not in sitemap. Default for every new page. It passes link equity, collects internal links, stays invisible to search.
— Rung 1 — Promote to index + sitemap when: required fields complete AND at least 1 organic impression OR 3 internal inbound links. Gate: both data and demand signals.
— Rung 2 — Add to priority sitemap when the page holds page-2 visibility for any query for 14 days. Gate: sustained, not a one-day spike.
— Rung 3 — Link from the money page when it converts or ranks top-10.

Demotion is automatic:
☐ Field count drops below required → back to Rung 0.
☐ Zero impressions in 90 days → back to Rung 0, remove from sitemap.

Guardrail: the ladder is a nightly job, not a manual review. No page promotes itself.

Ship gate: don't publish until all boxes are checked.
SOP: generate schema from the same record that renders the page

Hand-written JSON-LD drifts from visible content at scale and triggers "structured data does not match" penalties. Bind schema to the source.

The binding rules:
☐ Step 1 — One serializer per entity type. The Product page and its Product schema read the same record object. Gate: no field appears in JSON-LD that isn't on the page.
☐ Step 2 — Null-safe by construction. A missing price omits the offers node entirely. Gate: never emit "price": null or a placeholder.
☐ Step 3 — No invented review counts or ratings. Gate: aggregateRating renders only when real review data exists for that record.
☐ Step 4 — Validate in CI. Run the structured-data test on 20 sampled records per template. Gate: fail the build on any error, not warning-and-ship.
☐ Step 5 — Type honesty. A list page is ItemList or CollectionPage, not Article. Gate: type must match page intent.

Guardrail: a monthly diff comparing visible fields to schema fields flags drift before Google does.

Ship gate: don't publish until all boxes are checked.
Faceted navigation: the index-bloat firewall

Filters and sorts multiply URLs combinatorially. Three filters with ten options each is a thousand crawlable variants of one page. Build the firewall before you build the filters.

The control matrix — decide per parameter, once:
☐ Indexable facets: the 1-2 dimensions with real search demand (e.g. {category}, {city}). Clean path URLs, indexed, in sitemap.
☐ Non-indexable facets: sort, view, page-size, in-stock toggles. Action: query string + noindex + canonical to the unfiltered version. Gate: never a crawlable path.
☐ Combination cap: indexable only for single-facet and the 5 highest-demand two-facet pairs. Everything else canonicalizes up. Gate: fail any 3-facet URL that returns 200 and indexable.
☐ Parameter order: enforce a canonical order so ?a=1&b=2 and ?b=2&a=1 don't become two URLs.
☐ Internal links never point at non-indexable facet URLs.

Guardrail: a crawl of your own filters that asserts the indexable URL count matches your matrix, not the combinatorial total.

Ship gate: don't publish until all boxes are checked.
The decay-detection SOP for generated page sets

pSEO pages rot silently — a data feed goes stale, a competitor refreshes, rankings slide across 5,000 pages at once. Run scheduled detection.

Monthly decay job:
☐ Step 1 — Pull 90-day-over-90-day clicks and impressions per URL pattern, grouped by template. Gate: flag any template losing both metrics by 20%+.
☐ Step 2 — Cross-reference data freshness. Gate: if decay correlates with stale last_verified dates, the fix is the feed, not the copy.
☐ Step 3 — Sample the SERP for 10 decayed pages. Gate: if a SERP feature (AI overview, pack) now owns the query, mark the template for format change, not refresh.
☐ Step 4 — Triage. Refresh data, rewrite the differentiation block, or retire to noindex. Each decayed page gets exactly one disposition.
☐ Step 5 — Re-submit only refreshed pages in the sitemap with an honest lastmod.

Guardrail: track "pages refreshed vs pages decayed" as a rolling ratio. If you're refreshing slower than decay, freeze new generation until you catch up.

Ship gate: don't publish until all boxes are checked.
The differentiation budget: how much must be unique per page

"Unique content" is too vague to gate on. Set a numeric budget and enforce it in the template.

The per-page budget:
— At least 35% of rendered words must come from record-specific fields, not the shared skeleton. Gate: a word-source tagger fails any page below 35%.
— At least 1 computed value that no sibling shares: a ratio, a delta vs. category average, a rank. Gate: this number must change across pages or the block is boilerplate.
— At least 1 data point in the H1 or first 100 words. Gate: the opening must not be identical across the set.

How to enforce:
☐ Tag every template token as STATIC or DYNAMIC at build time.
☐ Render 100 sample pages, measure the DYNAMIC ratio.
☐ If a template can't clear 35%, the dimension is too thin — merge records into fewer, richer pages instead of generating many empty ones.
☐ Re-measure on every template edit; static additions silently erode the ratio.

Ship gate: don't publish until all boxes are checked.
Sitemap hygiene SOP for dynamic page sets

A sitemap that lists noindex pages, 404s, or redirects sends conflicting signals at scale. Treat it as a managed artifact, regenerated nightly.

The generation contract:
☐ Step 1 — Source of truth is the index ladder, not the route table. Only Rung-1+ pages get listed. Gate: a noindex URL in the sitemap fails CI.
☐ Step 2 — Status pre-check. Sample-fetch entries; any non-200 is excluded. Gate: fail the build if more than 1% of a sample returns errors.
☐ Step 3 — Shard at 45,000 URLs (under the 50k limit, with headroom) and register all shards in a sitemap index.
☐ Step 4 — Honest lastmod from the record's real update timestamp. Gate: no blanket "today" stamps.
☐ Step 5 — Diff vs. yesterday. Log added/removed URLs. A sudden 10k drop should alert, not ship silently.

Guardrail: monthly, reconcile sitemap URL count against Search Console's indexed count. A widening gap means the firewall upstream is leaking thin pages.

Ship gate: don't publish until all boxes are checked.