A sitemap is an XML file that includes information about your site’s pages, images, videos, and other files on your website. This information is then presented to a search engine bot like Googlebot so the search engine can better crawl your site.
Since you provide the sitemap information, this also clues Google in on which pages and files you believe are the most important on your site. Plus, the sitemap offers the search engine valuable information about these pages and files.
With pages, for example, this additional information can be when the page was last updated, how often the page receives changes, and any language alternatives used on the page.
According to Google, no site map can be more than 50MB (uncompressed) and cannot exceed 50,000 URLs. If your sitemap exceeds this file size or URL count, you will need to create multiple sitemaps.
Below is a simple example of an XML sitemap that shows the location of a single URL.
Sitemaps.org has many examples of sitemaps, many with complex scenarios and full documentation.
Let’s break down the components of a sitemap.
<?xml version="1.0" encoding="UTF-8"?>
The first line of the sitemap above informs the search engines that they are reading an XML file. In addition to this, the search engine can pick up on the version, in this case “1.0”, which is the preferred version for sitemaps.
You’ll also see the type of encoding. It’s necessary that the encoding type is UTF-8.
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
The next line of the XML sitemap is the URL set. This is a container for all the URLs in the sitemap.
The last portion of the text, “/sitemap/0.9” indicates the protocol standard. In this case, a common protocol standard is used: Sitemap 0.9. Most search engine crawlers support this standard.
<url>
<loc>https://www.seoclarity.net</loc>
<lastmod>2019-08-21T16:12:20+03:00</lastmod>
</url>
This portion of the XML sitemap shows the parent tag. The location of the URL, in this case, the URL leading to seoClarity’s homepage, must be within the <loc> tags.
It’s also important that these URLs are only absolute, not relative, canonical URLs.
This portion of the XML sitemap can also include additional components, as seen on Sitemaps.org. For example, <lastmod> is optional here. These optional components are not crucial to your SEO, but they are there if you choose to use them:
The date when the file was last modified. (This can also include the time as well.) It’s important to note that the W3C datetime format must be used. That is, YYYY-MM-DD.
Remember that the sitemap contains your most important URLs. The <priority> property is another opportunity to specify the importance of each individual URL within the sitemap. You must pick a value between 0.0 and 1.0, where the highest priority is given to URL 1.0.
How frequently the page is likely to change, based on frequencies of always, hourly, daily, weekly, monthly, yearly, or never. This clues search engine bots in on how often they want to come back to recrawl the page.
Google offers multiple general sitemap guidelines to follow when building and submitting and sitemap to the search engine.
Now that you’re familiar with the purpose of the XML sitemap and its general best practices, it’s helpful to be aware of common issues that arise with the sitemap.