Language targeting and region targeting are different problems
A framing that resolves a lot of confusion: hreflang lets you target by language (
The hypothesis we keep testing: teams reach for
What the data and spec suggest:
— Pure language targeting (
— Language+region (
— Region-only is not expressible. There is no
A subtle trap: the region code is ISO 3166-1 Alpha-2 (a country), not a continent or market. "Latin America" has no code. "UK" is invalid — the country is
Limitation: we can't measure the counterfactual traffic of over-fragmented clusters cleanly, but the consolidation cases we've run consistently simplified crawling without measurable loss.
A framing that resolves a lot of confusion: hreflang lets you target by language (
de), by language+region (de-AT), but never by region alone in a way most people expect.The hypothesis we keep testing: teams reach for
de-AT when they actually mean "German speakers in Austria," but the value they need depends on whether their differentiation is linguistic or commercial.What the data and spec suggest:
— Pure language targeting (
es with no region) is correct when content is identical for all Spanish speakers. Adding spurious regions here fragments your cluster and dilutes signals across near-duplicate pages.— Language+region (
es-MX vs es-AR) earns its keep only when something genuinely differs: currency, pricing, legal terms, shipping, local idiom. If the only difference is the URL, you've created an international duplicate-content management cost with no upside.— Region-only is not expressible. There is no
hreflang="AT". If you need Austria regardless of language, that's a Search Console country-targeting / ccTLD job, not hreflang.A subtle trap: the region code is ISO 3166-1 Alpha-2 (a country), not a continent or market. "Latin America" has no code. "UK" is invalid — the country is
gb.Limitation: we can't measure the counterfactual traffic of over-fragmented clusters cleanly, but the consolidation cases we've run consistently simplified crawling without measurable loss.
If you're into what we post, @MigrationHelpdesk is the natural next follow — they work the site migrations beat hard. Your site-migration questions, answered. Redirect maps, domain moves, replatforming,…
The self-referencing annotation everyone forgets
A finding that surprises people: a valid hreflang set must include the page itself. Each URL in the cluster lists every alternate and a tag pointing to its own URL with its own language code.
Why it's not redundant: Google uses the self-reference to confirm which canonical URL the cluster member actually is. Omit it, and the page declares its siblings but not its own identity, which weakens the cluster's internal consistency.
What we observe in malformed implementations:
— Templating systems that loop over "other languages" and forget to emit the current one. The output looks complete to a casual reviewer because every other language is present.
— Self-reference pointing to a non-canonical variant (with tracking parameters, trailing slash mismatch, or http vs https). The annotation must match the canonical exactly — a near-miss is treated as a different URL and can break reciprocity for the whole cluster.
— Self-reference present but with the wrong language code (copy-paste from a sibling), which silently mislabels the page's own locale.
The practical test: pick any URL, fetch its hreflang block, and confirm one annotation's href equals that exact URL. If it doesn't, the cluster is incomplete regardless of how many siblings are listed.
Limitation: Google is somewhat forgiving and may still cluster pages with minor inconsistencies. We've seen self-reference omissions tolerated on strong sites — but "sometimes tolerated" is a poor foundation, and the failure mode is invisible until traffic moves.
A finding that surprises people: a valid hreflang set must include the page itself. Each URL in the cluster lists every alternate and a tag pointing to its own URL with its own language code.
Why it's not redundant: Google uses the self-reference to confirm which canonical URL the cluster member actually is. Omit it, and the page declares its siblings but not its own identity, which weakens the cluster's internal consistency.
What we observe in malformed implementations:
— Templating systems that loop over "other languages" and forget to emit the current one. The output looks complete to a casual reviewer because every other language is present.
— Self-reference pointing to a non-canonical variant (with tracking parameters, trailing slash mismatch, or http vs https). The annotation must match the canonical exactly — a near-miss is treated as a different URL and can break reciprocity for the whole cluster.
— Self-reference present but with the wrong language code (copy-paste from a sibling), which silently mislabels the page's own locale.
The practical test: pick any URL, fetch its hreflang block, and confirm one annotation's href equals that exact URL. If it doesn't, the cluster is incomplete regardless of how many siblings are listed.
Limitation: Google is somewhat forgiving and may still cluster pages with minor inconsistencies. We've seen self-reference omissions tolerated on strong sites — but "sometimes tolerated" is a poor foundation, and the failure mode is invisible until traffic moves.
When hreflang and canonical fight, hreflang loses
A conflict pattern worth understanding precisely, because it silently deletes pages from international results.
The scenario: /fr/ and /fr-ca/ are near-identical. To manage duplicate content, someone sets a canonical on /fr-ca/ pointing to /fr/. Simultaneously, the hreflang cluster lists both as distinct alternates.
What happens: the canonical wins. By declaring /fr-ca/ a duplicate of /fr/, you've told Google /fr-ca/ should not be indexed as itself. Its hreflang annotation now points to a URL Google has been instructed to ignore. The Canadian-French targeting evaporates.
The rule that resolves it: every page in an hreflang cluster must be self-canonical. Canonical and hreflang must agree. Cross-canonicalizing between language or region variants is almost always a mistake — they are not duplicates in Google's international model; they are alternates.
The legitimate exception: if /fr/ and /fr-ca/ truly are identical with nothing region-specific, the correct fix is to delete /fr-ca/ entirely and let
Diagnostic: for each cluster member, confirm
Caveat we flag: parameter-based duplicates (sort, session) should still canonicalize to the clean URL — that's orthogonal to the language cluster and not what this warning is about.
A conflict pattern worth understanding precisely, because it silently deletes pages from international results.
The scenario: /fr/ and /fr-ca/ are near-identical. To manage duplicate content, someone sets a canonical on /fr-ca/ pointing to /fr/. Simultaneously, the hreflang cluster lists both as distinct alternates.
What happens: the canonical wins. By declaring /fr-ca/ a duplicate of /fr/, you've told Google /fr-ca/ should not be indexed as itself. Its hreflang annotation now points to a URL Google has been instructed to ignore. The Canadian-French targeting evaporates.
The rule that resolves it: every page in an hreflang cluster must be self-canonical. Canonical and hreflang must agree. Cross-canonicalizing between language or region variants is almost always a mistake — they are not duplicates in Google's international model; they are alternates.
The legitimate exception: if /fr/ and /fr-ca/ truly are identical with nothing region-specific, the correct fix is to delete /fr-ca/ entirely and let
fr serve both, not to keep two URLs glued by a canonical.Diagnostic: for each cluster member, confirm
rel=canonical is self-referential. Any cross-pointing canonical inside an hreflang set is the bug.Caveat we flag: parameter-based duplicates (sort, session) should still canonicalize to the clean URL — that's orthogonal to the language cluster and not what this warning is about.
The country-targeting setting you can no longer rely on
A piece of institutional memory worth correcting: the International Targeting report's country-setting in Search Console was deprecated. The old workflow — "set this subfolder to target Germany in GSC" — no longer exists as a control.
What this changes in practice:
— Geotargeting now flows from three sources Google still reads: ccTLD signals (hard), hreflang annotations (the language/region handshake), and on-page/behavioral signals (currency, address, local links, server location as a weak hint).
— For a generic TLD on subfolders, you have no explicit "this folder = this country" lever anymore. Hreflang carries that weight. This raises the stakes on getting region codes right, because there's no longer a manual override to compensate.
— The hreflang error reporting that lived in that report has partly migrated; you now lean more on the URL Inspection tool and third-party validators to confirm cluster health.
The deeper point: Google's stated direction is that explicit country-targeting was a crutch, and they'd rather infer market relevance from content and links. Whether that's better for you depends on whether your signals are clear. A US-priced page on a /uk/ subfolder with no other UK signal is now harder to correct manually.
Unknown: how much residual weight legacy GSC settings still carry for sites that set them years ago. Officially zero; we can't verify the implementation.
A piece of institutional memory worth correcting: the International Targeting report's country-setting in Search Console was deprecated. The old workflow — "set this subfolder to target Germany in GSC" — no longer exists as a control.
What this changes in practice:
— Geotargeting now flows from three sources Google still reads: ccTLD signals (hard), hreflang annotations (the language/region handshake), and on-page/behavioral signals (currency, address, local links, server location as a weak hint).
— For a generic TLD on subfolders, you have no explicit "this folder = this country" lever anymore. Hreflang carries that weight. This raises the stakes on getting region codes right, because there's no longer a manual override to compensate.
— The hreflang error reporting that lived in that report has partly migrated; you now lean more on the URL Inspection tool and third-party validators to confirm cluster health.
The deeper point: Google's stated direction is that explicit country-targeting was a crutch, and they'd rather infer market relevance from content and links. Whether that's better for you depends on whether your signals are clear. A US-priced page on a /uk/ subfolder with no other UK signal is now harder to correct manually.
Unknown: how much residual weight legacy GSC settings still carry for sites that set them years ago. Officially zero; we can't verify the implementation.
Hreflang does not rank you higher. It swaps which URL ranks.
A hypothesis we test in every audit because the misunderstanding is so costly: people implement hreflang expecting a ranking lift, see no traffic increase, and conclude it "didn't work."
What hreflang actually does: it's a URL-swapping mechanism, not a ranking factor. When your cluster ranks for a query, hreflang influences which language/region variant Google shows that specific user — not the position the cluster occupies.
What the data suggests this produces:
— Position in the SERP: largely unchanged. The cluster ranks on its own merits (content, links, relevance).
— CTR and engagement: this is where the gain hides. Showing a US user usd-priced /en-us/ instead of /en-gb/ improves click-through and reduces pogo-sticking, which can indirectly help over time.
— Cannibalization relief: correct hreflang stops two language variants from competing for the same impression, consolidating signals onto the right URL per locale.
So "traffic didn't move" is often a measurement artifact — total clicks held, but the right pages now receive them, visible only in a per-locale breakdown.
Caveats and honesty: the indirect engagement-to-ranking pathway is real but small and hard to isolate. We won't claim hreflang lifts rankings directly — the evidence doesn't support it. Its value is correctness and user-URL matching, which is plenty, but sell it as the right thing, not a growth hack.
A hypothesis we test in every audit because the misunderstanding is so costly: people implement hreflang expecting a ranking lift, see no traffic increase, and conclude it "didn't work."
What hreflang actually does: it's a URL-swapping mechanism, not a ranking factor. When your cluster ranks for a query, hreflang influences which language/region variant Google shows that specific user — not the position the cluster occupies.
What the data suggests this produces:
— Position in the SERP: largely unchanged. The cluster ranks on its own merits (content, links, relevance).
— CTR and engagement: this is where the gain hides. Showing a US user usd-priced /en-us/ instead of /en-gb/ improves click-through and reduces pogo-sticking, which can indirectly help over time.
— Cannibalization relief: correct hreflang stops two language variants from competing for the same impression, consolidating signals onto the right URL per locale.
So "traffic didn't move" is often a measurement artifact — total clicks held, but the right pages now receive them, visible only in a per-locale breakdown.
Caveats and honesty: the indirect engagement-to-ranking pathway is real but small and hard to isolate. We won't claim hreflang lifts rankings directly — the evidence doesn't support it. Its value is correctness and user-URL matching, which is plenty, but sell it as the right thing, not a growth hack.
The invalid-code errors that pass casual review
An audit pattern: hreflang values that look plausible to a human but are silently invalid, so Google ignores the annotation entirely while Search Console may not loudly complain.
The code structure, precisely: language must be ISO 639-1 (two letters), optional region must be ISO 3166-1 Alpha-2, joined by a hyphen, language first.
The specific traps we catch repeatedly:
—
—
— Underscore instead of hyphen (
— Wrong order (
—
Why it's insidious: an invalid value doesn't error the whole page — Google just drops that one annotation, so a 12-language cluster quietly becomes an 11-language cluster with no obvious symptom.
Limitation: validators disagree on edge cases like script subtags and deprecated codes. When in doubt, prefer the simplest valid language-only code over a clever region guess.
An audit pattern: hreflang values that look plausible to a human but are silently invalid, so Google ignores the annotation entirely while Search Console may not loudly complain.
The code structure, precisely: language must be ISO 639-1 (two letters), optional region must be ISO 3166-1 Alpha-2, joined by a hyphen, language first.
en-GB, not gb-en, not en_GB, not en-UK.The specific traps we catch repeatedly:
—
en-UK: "UK" is not a valid region code. The country is GB. This one is almost universal and almost always wrong.—
en-EU, es-LATAM: the EU and Latin America are not countries; no Alpha-2 exists. There is no way to target "Europe" or "Latin America" with one region code.— Underscore instead of hyphen (
en_us): copied from locale strings in code. Invalid in hreflang.— Wrong order (
us-en): region-language is backwards.—
zh-CN vs script subtags: you can specify script (zh-Hans) per BCP-47, which is sometimes more correct than region for Chinese, but tooling support varies.Why it's insidious: an invalid value doesn't error the whole page — Google just drops that one annotation, so a 12-language cluster quietly becomes an 11-language cluster with no obvious symptom.
Limitation: validators disagree on edge cases like script subtags and deprecated codes. When in doubt, prefer the simplest valid language-only code over a clever region guess.
Three places to put hreflang — and when each one wins
Hreflang can live in the HTML head, in HTTP headers, or in the XML sitemap. They're functionally equivalent to Google, but the operational tradeoffs differ sharply, and mixing them causes problems.
What we've found across implementations:
— HTML
— HTTP headers: the only option for non-HTML files (PDFs, etc.). Operationally fiddly; hard to audit because you can't see them in page source without inspecting response headers.
— XML sitemap: scales best for large clusters. The annotations live in one centralized, machine-generated file rather than scattered across thousands of pages. Easiest to regenerate atomically when a locale is added or removed.
The critical rule: pick one method per cluster. Don't put partial hreflang in HTML and the rest in the sitemap — Google reads all sources, and inconsistencies between them create contradictions that can invalidate the cluster.
The scaling argument: past roughly 10 languages, the sitemap method's centralization usually pays for itself in maintainability and reduced reciprocity errors.
Caveat: sitemap hreflang is the least visible to third-party tools and the hardest for new team members to discover, so document it. The right choice trades inspectability against maintainability.
Hreflang can live in the HTML head, in HTTP headers, or in the XML sitemap. They're functionally equivalent to Google, but the operational tradeoffs differ sharply, and mixing them causes problems.
What we've found across implementations:
— HTML
<link> tags: most common, easiest to inspect, but they bloat every page. A 50-language cluster adds 50 link tags to all 50 pages — that's 2,500 annotations to keep synchronized, and page weight grows linearly. Fine for small clusters.— HTTP headers: the only option for non-HTML files (PDFs, etc.). Operationally fiddly; hard to audit because you can't see them in page source without inspecting response headers.
— XML sitemap: scales best for large clusters. The annotations live in one centralized, machine-generated file rather than scattered across thousands of pages. Easiest to regenerate atomically when a locale is added or removed.
The critical rule: pick one method per cluster. Don't put partial hreflang in HTML and the rest in the sitemap — Google reads all sources, and inconsistencies between them create contradictions that can invalidate the cluster.
The scaling argument: past roughly 10 languages, the sitemap method's centralization usually pays for itself in maintainability and reduced reciprocity errors.
Caveat: sitemap hreflang is the least visible to third-party tools and the hardest for new team members to discover, so document it. The right choice trades inspectability against maintainability.
The crawl-budget tax of large hreflang clusters
An under-discussed cost: hreflang creates a dense graph of cross-references, and Google must crawl and reconcile every node before it can trust the cluster. For large multilingual sites, this interacts with crawl budget in ways that delay correct behavior.
The mechanism: a cluster is only valid once Google has recrawled enough members to confirm reciprocity. Add a 40th language, and the annotation isn't confirmed until the crawler revisits a meaningful share of the other 39 pages to see the new return tags.
What we observe:
— On sites with constrained crawl budget, newly added locales can take weeks to activate, because the bottleneck is recrawl frequency of the slowest cluster members, not the new page itself.
— Frequent structural churn (adding/removing locales, changing URL patterns) keeps the cluster in a perpetual semi-confirmed state. Each change resets the reconciliation clock for affected nodes.
— Sitemaps with accurate
The practical implication: international rollouts should be batched and stable, not drip-fed. Adding 20 markets at once, then leaving the structure alone, confirms faster than adding one per week for 20 weeks.
Caveats: crawl budget is mostly a concern at scale (hundreds of thousands of URLs); small sites rarely hit it. And we can't directly measure Google's internal reconciliation state — we infer it from the lag between deploying tags and seeing correct per-locale serving.
An under-discussed cost: hreflang creates a dense graph of cross-references, and Google must crawl and reconcile every node before it can trust the cluster. For large multilingual sites, this interacts with crawl budget in ways that delay correct behavior.
The mechanism: a cluster is only valid once Google has recrawled enough members to confirm reciprocity. Add a 40th language, and the annotation isn't confirmed until the crawler revisits a meaningful share of the other 39 pages to see the new return tags.
What we observe:
— On sites with constrained crawl budget, newly added locales can take weeks to activate, because the bottleneck is recrawl frequency of the slowest cluster members, not the new page itself.
— Frequent structural churn (adding/removing locales, changing URL patterns) keeps the cluster in a perpetual semi-confirmed state. Each change resets the reconciliation clock for affected nodes.
— Sitemaps with accurate
lastmod meaningfully help here — they hint which members changed, focusing recrawl where it matters.The practical implication: international rollouts should be batched and stable, not drip-fed. Adding 20 markets at once, then leaving the structure alone, confirms faster than adding one per week for 20 weeks.
Caveats: crawl budget is mostly a concern at scale (hundreds of thousands of URLs); small sites rarely hit it. And we can't directly measure Google's internal reconciliation state — we infer it from the lag between deploying tags and seeing correct per-locale serving.
Relative URLs and protocol mismatches break clusters quietly
A mechanical failure mode that produces no obvious error but severs the cluster: hreflang hrefs that aren't fully-qualified, canonical, absolute URLs.
The requirements, stated exactly:
— Absolute, not relative.
— Protocol must match reality. If the site is on HTTPS but the hreflang href says
— Trailing-slash consistency.
— www vs non-www, and tracking parameters, same story — any divergence from the exact canonical URL is a different node.
Why it's hard to catch: each individual URL still resolves (via redirect), the page still loads, nothing 404s. The damage is purely in cluster reconciliation — Google sees the declared alternate and the actual canonical as two different things and declines to confirm reciprocity.
The audit: extract every hreflang href, and confirm each is byte-for-byte the canonical URL of its target, including protocol, host, path, and slash. Caveat: a few redirects in a large cluster won't necessarily collapse it — Google is somewhat resilient — but they accumulate into the "no return tags" warning.
A mechanical failure mode that produces no obvious error but severs the cluster: hreflang hrefs that aren't fully-qualified, canonical, absolute URLs.
The requirements, stated exactly:
— Absolute, not relative.
href="/de/" is invalid in this context; it must be https://example.com/de/. Some parsers tolerate relative, but the spec and Google expect absolute, and tolerance is not something to depend on.— Protocol must match reality. If the site is on HTTPS but the hreflang href says
http://, that's a different URL. Google follows it, may get redirected, and the redirect breaks the clean reciprocity match.— Trailing-slash consistency.
/de and /de/ are different URLs. If the annotation says one and the canonical is the other, the return-tag handshake fails on a character.— www vs non-www, and tracking parameters, same story — any divergence from the exact canonical URL is a different node.
Why it's hard to catch: each individual URL still resolves (via redirect), the page still loads, nothing 404s. The damage is purely in cluster reconciliation — Google sees the declared alternate and the actual canonical as two different things and declines to confirm reciprocity.
The audit: extract every hreflang href, and confirm each is byte-for-byte the canonical URL of its target, including protocol, host, path, and slash. Caveat: a few redirects in a large cluster won't necessarily collapse it — Google is somewhat resilient — but they accumulate into the "no return tags" warning.
International near-duplicates: why Google mostly forgives them
A reassurance grounded in how Google's model actually works, because fear of "duplicate content penalties" drives a lot of bad hreflang decisions.
The hypothesis to dispel: that /en-us/ and /en-gb/ with 95% identical text trigger a duplicate-content penalty requiring canonicalization to one master.
What the evidence and Google's statements suggest:
— There is no duplicate-content penalty for legitimate international variants. Google explicitly recognizes that the same product page in US and UK English is a normal, expected pattern, not manipulation.
— Hreflang is precisely the mechanism that tells Google "these are localized equivalents, pick the right one per user" — which is the opposite of asking it to choose one and suppress the rest.
— Without hreflang, Google may still figure it out, but it might consolidate the variants and serve the "wrong" one (a UK user seeing USD prices), or pick one to index and filter the others as duplicates.
So the risk of international duplicates isn't a penalty — it's wrong-variant serving and signal dilution. Hreflang fixes serving; it doesn't need to fix a penalty that doesn't exist.
The genuine caveat: if your only differentiation between locales is the URL and a flag in the corner, you have thin localization. Google may still cluster them and the hreflang adds little. The fix there isn't more annotation — it's more genuine localization (price, currency, examples, spelling) so the variants earn their separate existence.
A reassurance grounded in how Google's model actually works, because fear of "duplicate content penalties" drives a lot of bad hreflang decisions.
The hypothesis to dispel: that /en-us/ and /en-gb/ with 95% identical text trigger a duplicate-content penalty requiring canonicalization to one master.
What the evidence and Google's statements suggest:
— There is no duplicate-content penalty for legitimate international variants. Google explicitly recognizes that the same product page in US and UK English is a normal, expected pattern, not manipulation.
— Hreflang is precisely the mechanism that tells Google "these are localized equivalents, pick the right one per user" — which is the opposite of asking it to choose one and suppress the rest.
— Without hreflang, Google may still figure it out, but it might consolidate the variants and serve the "wrong" one (a UK user seeing USD prices), or pick one to index and filter the others as duplicates.
So the risk of international duplicates isn't a penalty — it's wrong-variant serving and signal dilution. Hreflang fixes serving; it doesn't need to fix a penalty that doesn't exist.
The genuine caveat: if your only differentiation between locales is the URL and a flag in the corner, you have thin localization. Google may still cluster them and the hreflang adds little. The fix there isn't more annotation — it's more genuine localization (price, currency, examples, spelling) so the variants earn their separate existence.