Scale Engine SOP
161 subscribers
1 photo
Repeatable frameworks, checklists and SOPs for building thousands of pages from data — templates, guardrails against thin content, and the QA steps that keep you indexed.
Download Telegram
Channel created
Channel photo updated
The pre-publish index-coverage gate

Most pSEO teams ship first, then watch Search Console fill with "Crawled - currently not indexed." Reverse the order. Gate every template at the page level before scale.

Run this on 50 sample URLs:
☐ Step 1 — Render check. Fetch with a headless crawler, confirm primary content exists in raw HTML, not just after JS. Gate: fail if main entity data is client-side only.
☐ Step 2 — Uniqueness diff. Shingle the body (5-word windows) across the 50 pages. Gate: fail if median cross-page overlap exceeds 40%.
☐ Step 3 — Internal demand. Each page must have at least 3 incoming internal links from non-template pages. Gate: fail any orphan.
☐ Step 4 — Query coverage. Pull the head term per page from your keyword map. Gate: fail if 0 search demand in your data source.
☐ Step 5 — Canonical self-reference. Gate: fail if canonical points elsewhere.

Only templates that pass all five at sample scale earn the right to generate 10,000 pages.

Ship gate: don't publish until all boxes are checked.
Hub-page architecture: the spine that holds 10k leaf pages

Leaf pages without a hub structure are an orphan farm. Design the spine first; generate leaves into it.

The three-layer spine:
— Layer 1 — Pillar (1 per major dimension): /loans/. Owner-action: editorial, hand-built, links down to all hubs. Gate: must exist before any leaf.
— Layer 2 — Hubs (1 per category value): /loans/{type}/. Auto-generated, lists and links to its leaves, links up to the pillar. Gate: a hub with under 5 qualifying leaves should not exist — fold it up.
— Layer 3 — Leaves: /loans/{type}/{state}/. Each links up to its hub and across to siblings.

Rules:
☐ Every leaf is reachable from the pillar in ≤2 clicks.
☐ Hubs paginate at 100 leaves with rel sequencing, not infinite scroll that hides links from crawlers.
☐ A hub's own content includes a real summary, not just the link list, or it's a thin page itself.
☐ Empty hubs 404 or 301 up — never a blank index.

Ship gate: don't publish until all boxes are checked.
Data-source validation runs before the template, not after

Your pages are only as trustworthy as the feed behind them. Treat the data source as the first QA gate.

Pre-ingest checklist for any new feed:
☐ Step 1 — Freshness stamp. Every record carries a last_verified date. Gate: reject the feed if more than 20% of records are older than your refresh SLA.
☐ Step 2 — Null-rate ceiling per field. Gate: any field above 30% null is demoted to optional and removed from titles/H1s.
☐ Step 3 — Outlier clamp. Numeric fields get min/max bounds. Gate: a $0 or $9,999,999 price flags the record for hold, not publish.
☐ Step 4 — Canonical naming. Map source values to your controlled vocabulary (state codes, currency, units) before render. Gate: fail any unmapped enum.
☐ Step 5 — Dedupe key. Define the composite key that makes a record unique. Gate: fail the batch if duplicate keys exist.

Guardrail: ingest writes to a staging table. Production reads only records that passed all five. A failed record never silently becomes a thin page.

Ship gate: don't publish until all boxes are checked.
Failure mode: pages reachable only via the sitemap

Programmatic sets often have no internal path to the individual page — the sitemap is the only entry point. Google treats sitemap-only URLs as low priority and crawls them slowly or never. They look orphaned because they are.

Internal-link gate:
— Step 1. Owner: dev. Every generated page must receive at least 3 internal links from other crawlable pages (hub, siblings, related-by-data).
— Step 2. Owner: SEO. Run a crawl starting from the homepage only — no sitemap. Gate: every published URL is discoverable within 4 clicks.
— Step 3. Owner: dev. Build sibling/related modules from the data (same category, nearby geo, similar attribute), not random.
— Step 4. Owner: SEO. Report orphan count after each deploy. Gate: orphan count = 0.

Guardrail: the sitemap is a hint, never the primary discovery path.

Ship gate: don't publish until all boxes are checked.


Рядом по теме: @CrawlAndRender (там про crawl efficiency)
Staged rollout to starve thin pages before they ship

Don't release a full template set at once. Use a 3-tier release valve so weak pages never reach the index.

— Tier 1 (data-rich): pages where your source has 8+ populated fields. Action: publish immediately, submit in sitemap.
— Tier 2 (partial): 4-7 fields. Action: publish with noindex,follow. They pass link equity but stay out of the index until enriched.
— Tier 3 (sparse): under 4 fields. Action: do not render. Return 404 or fold into a parent hub.

The guardrail is a single field-count function in your template:
☐ Count non-null, non-boilerplate fields per record.
☐ Map count to tier.
☐ Tier drives the robots directive automatically — no human decides per page.

Weekly job: re-score Tier 2 pages. When a record crosses into 8+ fields, flip to index and add to the next sitemap. Rollback path: any page dropping below threshold flips back to noindex same day.

Ship gate: don't publish until all boxes are checked.
URL pattern spec: lock it before the first page exists

URL changes after launch are the most expensive rollback in pSEO. Write the spec once, freeze it.

The pattern contract:
☐ Step 1 — One variable per path segment. /loan/{type}/{state}, never /loan/{type}-in-{state}. Gate: fail if a segment encodes two dimensions.
☐ Step 2 — Slug source is immutable. Derive from a stable ID, not the display name. When "New-York City" becomes "NYC" in your data, the URL must not move. Gate: fail if slug derives from a mutable field.
☐ Step 3 — Casing and separators fixed: lowercase, hyphen, no trailing slash. Gate: fail any uppercase or underscore.
☐ Step 4 — Reserved-word guard. Strip values that collide with existing routes (/about, /api). Gate: fail on collision.
☐ Step 5 — Max one optional segment, and it must 301 to the canonical short form.

Guardrail: a unit test that generates 1,000 slugs from sample data and asserts zero duplicates and zero reserved-word hits.

Ship gate: don't publish until all boxes are checked.