The pre-publish index-coverage gate
Most pSEO teams ship first, then watch Search Console fill with "Crawled - currently not indexed." Reverse the order. Gate every template at the page level before scale.
Run this on 50 sample URLs:
☐ Step 1 — Render check. Fetch with a headless crawler, confirm primary content exists in raw HTML, not just after JS. Gate: fail if main entity data is client-side only.
☐ Step 2 — Uniqueness diff. Shingle the body (5-word windows) across the 50 pages. Gate: fail if median cross-page overlap exceeds 40%.
☐ Step 3 — Internal demand. Each page must have at least 3 incoming internal links from non-template pages. Gate: fail any orphan.
☐ Step 4 — Query coverage. Pull the head term per page from your keyword map. Gate: fail if 0 search demand in your data source.
☐ Step 5 — Canonical self-reference. Gate: fail if canonical points elsewhere.
Only templates that pass all five at sample scale earn the right to generate 10,000 pages.
Ship gate: don't publish until all boxes are checked.
Most pSEO teams ship first, then watch Search Console fill with "Crawled - currently not indexed." Reverse the order. Gate every template at the page level before scale.
Run this on 50 sample URLs:
☐ Step 1 — Render check. Fetch with a headless crawler, confirm primary content exists in raw HTML, not just after JS. Gate: fail if main entity data is client-side only.
☐ Step 2 — Uniqueness diff. Shingle the body (5-word windows) across the 50 pages. Gate: fail if median cross-page overlap exceeds 40%.
☐ Step 3 — Internal demand. Each page must have at least 3 incoming internal links from non-template pages. Gate: fail any orphan.
☐ Step 4 — Query coverage. Pull the head term per page from your keyword map. Gate: fail if 0 search demand in your data source.
☐ Step 5 — Canonical self-reference. Gate: fail if canonical points elsewhere.
Only templates that pass all five at sample scale earn the right to generate 10,000 pages.
Ship gate: don't publish until all boxes are checked.
Hub-page architecture: the spine that holds 10k leaf pages
Leaf pages without a hub structure are an orphan farm. Design the spine first; generate leaves into it.
The three-layer spine:
— Layer 1 — Pillar (1 per major dimension):
— Layer 2 — Hubs (1 per category value):
— Layer 3 — Leaves:
Rules:
☐ Every leaf is reachable from the pillar in ≤2 clicks.
☐ Hubs paginate at 100 leaves with
☐ A hub's own content includes a real summary, not just the link list, or it's a thin page itself.
☐ Empty hubs 404 or 301 up — never a blank index.
Ship gate: don't publish until all boxes are checked.
Leaf pages without a hub structure are an orphan farm. Design the spine first; generate leaves into it.
The three-layer spine:
— Layer 1 — Pillar (1 per major dimension):
/loans/. Owner-action: editorial, hand-built, links down to all hubs. Gate: must exist before any leaf.— Layer 2 — Hubs (1 per category value):
/loans/{type}/. Auto-generated, lists and links to its leaves, links up to the pillar. Gate: a hub with under 5 qualifying leaves should not exist — fold it up.— Layer 3 — Leaves:
/loans/{type}/{state}/. Each links up to its hub and across to siblings.Rules:
☐ Every leaf is reachable from the pillar in ≤2 clicks.
☐ Hubs paginate at 100 leaves with
rel sequencing, not infinite scroll that hides links from crawlers.☐ A hub's own content includes a real summary, not just the link list, or it's a thin page itself.
☐ Empty hubs 404 or 301 up — never a blank index.
Ship gate: don't publish until all boxes are checked.
Data-source validation runs before the template, not after
Your pages are only as trustworthy as the feed behind them. Treat the data source as the first QA gate.
Pre-ingest checklist for any new feed:
☐ Step 1 — Freshness stamp. Every record carries a
☐ Step 2 — Null-rate ceiling per field. Gate: any field above 30% null is demoted to optional and removed from titles/H1s.
☐ Step 3 — Outlier clamp. Numeric fields get min/max bounds. Gate: a $0 or $9,999,999 price flags the record for hold, not publish.
☐ Step 4 — Canonical naming. Map source values to your controlled vocabulary (state codes, currency, units) before render. Gate: fail any unmapped enum.
☐ Step 5 — Dedupe key. Define the composite key that makes a record unique. Gate: fail the batch if duplicate keys exist.
Guardrail: ingest writes to a staging table. Production reads only records that passed all five. A failed record never silently becomes a thin page.
Ship gate: don't publish until all boxes are checked.
Your pages are only as trustworthy as the feed behind them. Treat the data source as the first QA gate.
Pre-ingest checklist for any new feed:
☐ Step 1 — Freshness stamp. Every record carries a
last_verified date. Gate: reject the feed if more than 20% of records are older than your refresh SLA.☐ Step 2 — Null-rate ceiling per field. Gate: any field above 30% null is demoted to optional and removed from titles/H1s.
☐ Step 3 — Outlier clamp. Numeric fields get min/max bounds. Gate: a $0 or $9,999,999 price flags the record for hold, not publish.
☐ Step 4 — Canonical naming. Map source values to your controlled vocabulary (state codes, currency, units) before render. Gate: fail any unmapped enum.
☐ Step 5 — Dedupe key. Define the composite key that makes a record unique. Gate: fail the batch if duplicate keys exist.
Guardrail: ingest writes to a staging table. Production reads only records that passed all five. A failed record never silently becomes a thin page.
Ship gate: don't publish until all boxes are checked.
Failure mode: pages reachable only via the sitemap
Programmatic sets often have no internal path to the individual page — the sitemap is the only entry point. Google treats sitemap-only URLs as low priority and crawls them slowly or never. They look orphaned because they are.
Internal-link gate:
— Step 1. Owner: dev. Every generated page must receive at least 3 internal links from other crawlable pages (hub, siblings, related-by-data).
— Step 2. Owner: SEO. Run a crawl starting from the homepage only — no sitemap. Gate: every published URL is discoverable within 4 clicks.
— Step 3. Owner: dev. Build sibling/related modules from the data (same category, nearby geo, similar attribute), not random.
— Step 4. Owner: SEO. Report orphan count after each deploy. Gate: orphan count = 0.
Guardrail: the sitemap is a hint, never the primary discovery path.
Ship gate: don't publish until all boxes are checked.
—
Рядом по теме: @CrawlAndRender (там про crawl efficiency)
Programmatic sets often have no internal path to the individual page — the sitemap is the only entry point. Google treats sitemap-only URLs as low priority and crawls them slowly or never. They look orphaned because they are.
Internal-link gate:
— Step 1. Owner: dev. Every generated page must receive at least 3 internal links from other crawlable pages (hub, siblings, related-by-data).
— Step 2. Owner: SEO. Run a crawl starting from the homepage only — no sitemap. Gate: every published URL is discoverable within 4 clicks.
— Step 3. Owner: dev. Build sibling/related modules from the data (same category, nearby geo, similar attribute), not random.
— Step 4. Owner: SEO. Report orphan count after each deploy. Gate: orphan count = 0.
Guardrail: the sitemap is a hint, never the primary discovery path.
Ship gate: don't publish until all boxes are checked.
—
Рядом по теме: @CrawlAndRender (там про crawl efficiency)
Staged rollout to starve thin pages before they ship
Don't release a full template set at once. Use a 3-tier release valve so weak pages never reach the index.
— Tier 1 (data-rich): pages where your source has 8+ populated fields. Action: publish immediately, submit in sitemap.
— Tier 2 (partial): 4-7 fields. Action: publish with
— Tier 3 (sparse): under 4 fields. Action: do not render. Return 404 or fold into a parent hub.
The guardrail is a single field-count function in your template:
☐ Count non-null, non-boilerplate fields per record.
☐ Map count to tier.
☐ Tier drives the robots directive automatically — no human decides per page.
Weekly job: re-score Tier 2 pages. When a record crosses into 8+ fields, flip to
Ship gate: don't publish until all boxes are checked.
Don't release a full template set at once. Use a 3-tier release valve so weak pages never reach the index.
— Tier 1 (data-rich): pages where your source has 8+ populated fields. Action: publish immediately, submit in sitemap.
— Tier 2 (partial): 4-7 fields. Action: publish with
noindex,follow. They pass link equity but stay out of the index until enriched.— Tier 3 (sparse): under 4 fields. Action: do not render. Return 404 or fold into a parent hub.
The guardrail is a single field-count function in your template:
☐ Count non-null, non-boilerplate fields per record.
☐ Map count to tier.
☐ Tier drives the robots directive automatically — no human decides per page.
Weekly job: re-score Tier 2 pages. When a record crosses into 8+ fields, flip to
index and add to the next sitemap. Rollback path: any page dropping below threshold flips back to noindex same day.Ship gate: don't publish until all boxes are checked.
URL pattern spec: lock it before the first page exists
URL changes after launch are the most expensive rollback in pSEO. Write the spec once, freeze it.
The pattern contract:
☐ Step 1 — One variable per path segment.
☐ Step 2 — Slug source is immutable. Derive from a stable ID, not the display name. When "New-York City" becomes "NYC" in your data, the URL must not move. Gate: fail if slug derives from a mutable field.
☐ Step 3 — Casing and separators fixed: lowercase, hyphen, no trailing slash. Gate: fail any uppercase or underscore.
☐ Step 4 — Reserved-word guard. Strip values that collide with existing routes (
☐ Step 5 — Max one optional segment, and it must 301 to the canonical short form.
Guardrail: a unit test that generates 1,000 slugs from sample data and asserts zero duplicates and zero reserved-word hits.
Ship gate: don't publish until all boxes are checked.
URL changes after launch are the most expensive rollback in pSEO. Write the spec once, freeze it.
The pattern contract:
☐ Step 1 — One variable per path segment.
/loan/{type}/{state}, never /loan/{type}-in-{state}. Gate: fail if a segment encodes two dimensions.☐ Step 2 — Slug source is immutable. Derive from a stable ID, not the display name. When "New-York City" becomes "NYC" in your data, the URL must not move. Gate: fail if slug derives from a mutable field.
☐ Step 3 — Casing and separators fixed: lowercase, hyphen, no trailing slash. Gate: fail any uppercase or underscore.
☐ Step 4 — Reserved-word guard. Strip values that collide with existing routes (
/about, /api). Gate: fail on collision.☐ Step 5 — Max one optional segment, and it must 301 to the canonical short form.
Guardrail: a unit test that generates 1,000 slugs from sample data and asserts zero duplicates and zero reserved-word hits.
Ship gate: don't publish until all boxes are checked.
The internal-link injection SOP (deterministic, not random)
Random "related links" blocks leak crawl budget and link to dead-ends. Make linking a deterministic function of your data graph.
For each generated page, inject links in this fixed order:
— 1 link up to the parent hub (the {category} index). Owner: template. Gate: must exist.
— 3 sibling links to the nearest neighbors on your primary dimension (e.g. adjacent price tiers, same city). Owner: a ranked-neighbor query. Gate: siblings must themselves be indexable.
— 2 cross-dimension links (same {type}, different {region}). Gate: skip any target that is
— 1 link to the highest-authority page in the cluster (your money page).
Rules that keep it clean:
☐ Never link to a page that links back identically — break reciprocal loops.
☐ Anchor text pulls the target's H1 token, not a generic "click here."
☐ Cap total in-template links at 7 to avoid dilution.
☐ Run a monthly orphan report; any page with under 3 inbound internal links gets force-added to a neighbor's block.
Ship gate: don't publish until all boxes are checked.
Random "related links" blocks leak crawl budget and link to dead-ends. Make linking a deterministic function of your data graph.
For each generated page, inject links in this fixed order:
— 1 link up to the parent hub (the {category} index). Owner: template. Gate: must exist.
— 3 sibling links to the nearest neighbors on your primary dimension (e.g. adjacent price tiers, same city). Owner: a ranked-neighbor query. Gate: siblings must themselves be indexable.
— 2 cross-dimension links (same {type}, different {region}). Gate: skip any target that is
noindex.— 1 link to the highest-authority page in the cluster (your money page).
Rules that keep it clean:
☐ Never link to a page that links back identically — break reciprocal loops.
☐ Anchor text pulls the target's H1 token, not a generic "click here."
☐ Cap total in-template links at 7 to avoid dilution.
☐ Run a monthly orphan report; any page with under 3 inbound internal links gets force-added to a neighbor's block.
Ship gate: don't publish until all boxes are checked.
