Content is key for SEO – you already know this. But what does it take to help a search engine understand in what language content which pages are published in – and which target markets they are intended for?
The CMS backend of the Sitecore Experience Platform greatly supports efficient multilingual content management & publishing, so this blog post shall enable you to take a look at your website "from an outside perspective" in terms of delivering the right content to the external group of website stakeholder, the "website visitor" as I already introduced in a previous blog post explaining the conceptual basics & recommended principles for building global multilingual websites.
How do Google, and Baidu, and … know, what language they are currently crawling on your website?
Think about it: for the website visitor we usually have a language-selector and -indicator on every webpage, so she or he can easily spot and change the content language of the webpage. But did you every ask yourself,
how does a search crawler actually know & interpret the currently crawled webpage content in the correct language?
They don't scan a language selector flyout to find this, they can't read such an information somewhere in the footer of your website to understand this…
That's why we proactively need to help and enable search engines understanding our (multilingual) content!
If we don't do this, and we throw content in various languages at them, you will very likely see a mess in what language-version of your webpages turn up in search results from different country origins.
Key points for achieving a good international targeted SEO
In the following, I will go through 3 basic characteristics or relations which affect every single webpage – and what aspects of it are related and should be considered for achieving international targeted SEO.
- indicating language cultures in the page URL
- adding content culture related directives in the HTML-markup
- align the sitemap.xml file with every available page language version
1. the page URL
You have to indicate the intended content language (or language & country of the content) in every web page’s URL. There are different approaches possible, evaluate what best works for you and your website, and implement one of these schemes:
2. the HTML-markup
Besides indicating a webpage's language culture in the URL, you will also need to add *hidden* information on every page, indicating it's content language and country focus to search crawlers – so as related content in different languages (therefore preventing duplicate content warnings) and actually helping search crawler accessing your website from different locations on earth finding the relevant page in the right language.
Therefore, in every page’s HTML-header (markup), include the following technical directives:
- the language & country code (ISO) and according URL the page is intended for
- references to all other language & country versions of the same page
- directives how to handle generic language-only requests
- an ultimate "fallback" using the “x-default” directive
Check out example HTML-markup for all these mentioned topics in this GitHub Gist: namics-gists/International targeting SEO code snippets
alternate hreflang 2-char language-only cultures
In a use case where you publish e.g. multiple English versions of the same page, for example "en-US", "en-AU", "en-CA" versions, etc. you must tell search engines what version you want to have for people where the “country”-part is not resolvable when they access your page.
Therefore: if you use multiple cultures combining their corresponding ISO 3166 & 639-1 codes in the form of "language-country"-codes (e.g. "en-us", "fr-fr",…), make sure to also add references for all *generic* language-only versions pointing to the correct page url: e.g. ”de”, “en”, “fr”, “it”, etc.:
The reason is, that the country is either resolved from a visitor's IP or his/her browser delivering this information – but this is not a 100% reliable information and can be spoofed.
Don’t forget to add an ultimate "fallback" consisting of the “x-default” directive! This applies in situations, where neither the language of the visitor nor the country can be resolved for displaying the appropriate culture version of a page:
Note: these recommendations are all in addition to a canonical-URL to add in the markup for every single webpage, indication it's main URL including the culture information you assigned to it (even though the content of a /en-us/ and a /de-de/ page are the same, but translated, each version of it has a unique canonical-URL).
3. the sitemap.xml file
I'm sure you have a basic understanding what the sitemap.xml is and what it is intended for. In terms of optimal international targeting, make sure to represent your website structure including language & country information in the – or various – sitemap.xml files, too!
- add an individual sitemap.xml for every available language & country version
- URLs must be absolute, contain language & country information in their URL, and thus identical to the canonical URLs you render in the HTML-markup of each page
- Don’t forget to manually add every sitemap.xml via the Google Search Console… and, if provided, do the same for other search crawlers, too
- To get lost with multiple sitemap.xml files for different cultures of your website, combine them into a master/overview xml-file directive
Example culture-specific sitemap.xml content:
Example master xml for combining multiple sitemap.xml files:
Besides that, the same general rules, recommendations and most important, proper structure and content, for the sitemap.xml-files apply.
Sounds easy so far right? ;)
The big challenge comes up once you realise, that you want all these requirements to be dynamic & happening automatically – by still being configurable & editable to a certain point by your content managers!
So here’s there ground work to build it accordingly:
Generating SEO-friendly, multi-language and -country supporting, URLs
Make use and – if required – extend the Sitecore built-in Link Manager to achieve your goals:
Once you configured the Link Manager to your requirements, here are some use cases where and which culture-related URLs you can output:
// Get the UrlOptions from settings var urlOptions = _linkManager.GetDefaultUrlOptions(); // Set language urlOptions.Language = language; // Optional: Include server url for XML sitemap or alternate links urlOptions.AlwaysIncludeServerUrl = true; // Generate url. Best practice: Use BaseLinkManager var url = _linkManager.GetItemUrl(item, urlOptions);
Making your life easier: grab a free in-depth checklist
You can grab an extensive Excel checklist regarding international targeted SEO to-dos & sitemap.xml optimizations of the same regard:
- Check if your website fulfils all requirements: SEO – International targeting – Checklist.zip
If you are interested in boosting your website's SEO and need support in working through this checklist, feel free to get in touch with us!
In the next post of this series you will learn how you can identify the language and country origin of a visitor on your website and how the Sitecore Experience platform helps delivering the right content to the right audience.
This October we – my colleague Fabian Geiger & myself – had the pleasure to hold a breakout session at the Sitecore Symposium 2018 in Orlando. This blog post is a summary and follow up from the knowledge shared for building globally focused websites based on the Sitecore Experience Platform, with multiple languages and country cultures. Consider this a non conclusive compilation of learnings and recommendations from our project work at Namics.