Search engines are getting smarter, but sometimes you need to offer a little help so bots know where to go. And that’s what XML Sitemaps do.
What is an XML sitemap?
An XML sitemap is an important part of optimizing any website. They are files in which information about the pages, posts, images and other files of your website is provided.
Not only does it provide bots with a comprehensive list of all public pages on a website. Similarly, it gives search engines a better idea of the information architecture of the website. The hierarchy of its pages and the frequency of updating the content.
Advantages of implementing an XML sitemap
Generating an XML sitemap and submitting it to search engines is one of the most important steps. You can take to ensure that your website is indexed correctly. Therefore, the main advantages of having an updated sitemap are:
- Improves indexing, since submitting URLs is easier to facilitate their importance for inclusion in the Google index.
- Better tracking prioritization. For example, if your sitemap is cascaded, search engines are more likely to assign higher ranking weight to URLs that are higher in the content hierarchy of your website than those that are lower.
- Detect errors through tools such as Search Console. Creating the sitemap and submitting it to Google Search Console not only allows Google to find it more easily, but also allows us to know which URLs are correct and detect possible errors to fix them.
How to create an XML sitemap
There are several ways to generate a sitemap, the main ones are:
- Through a CMS. Some platforms like WordPress have SEO features that include XML sitemap generators. In addition, there are also plugins such as Yoast, which has one of the best sitemap generators.
- Using an online generator like XML Sitemaps Generator
- Create it by hand following Google’s guidelines to get better results and avoid mistakes.
A sitemap must be a living element, so we must optimize it based on the content that is added or disappears from the website.
How to help search engines find the sitemap
First, by submitting it through Google Search Console. In addition to helping search engines find a website’s sitemap, these platforms also provide information and diagnostics on the URLs that appear on it.
Second, it is also good to include the URL in the Robots.txt file so that search engines know the location of the XML sitemap.
Points to keep in mind to optimize a website’s sitemap
Eliminate unnecessary URLs and prioritize high-quality pages
It is quite common to have active pages that we do not want to index, whether due to duplicate content, SEO paginations, etc.
Do not include broken or redirected URLs in the sitemap
This may be very obvious, but in many cases it can happen to us to make the change in case of web redirects. For this reason, it is important to verify the status codes of each of the URLs with tools such as Screaming Frog, paying attention to the following values:
- 3xx redirects. These responses from the server mean that the URL has been redirected. If this is the case, it is best to remove the URL from the sitemap and add the final URL that it points to.
- 4xx errors. These responses from the server mean that the page you are visiting does not exist or has a problem. If it has been removed and is permanent, then that URL must be removed from the XML sitemap.
- 5xx forbidden. Some pages require a login to access. Since search engine bots cannot do this, they will simply receive a forbidden message. If any of these pages appear in the sitemap, they must be removed.
Include only canonical versions of URLs
As in the previous point, some pages may not be indexable due to special HTML tags, such as canonicalized pages. Check that you don’t have any URLs in the sitemap with the Rel = canonical tag pointing to another page.
Use the Robots meta tag instead of Robots.txt whenever possible
As a general rule, when we do not want to index a page, we must define the meta robots tag “noindex, follow” in the html code of the page. This prevents Google from indexing the page, but preserves the value of the links.
If it is seen that it indexes and consumes crawl budget, in that case, it could be considered de-indexing it through robots.txt
Do not include “noindex” URL in the sitemap
Some sitemap pages may not be indexable due to special HTML tags. This would be a clear example of a wasted tracking budget. If search engine robots can’t index certain pages, then it doesn’t make sense for them to be included in a website’s sitemap.
Monitor Search Console reports
As we’ve discussed, Search Console has a sitemap report that provides a lot of useful information about the status of the sitemap and the pages listed on it.
We can obtain a detailed report of the following values:
- Success: The sitemap has been uploaded and processed correctly and without errors.
- Has Errors – The sitemap could be parsed, but has some errors.
- Failed to get – The sitemap could not be obtained for some reason.
In conclusion, taking a look at your sitemap report in Search Console allows you to quickly spot the issues to fix.