You've researched high-value target keywords and created relevant content, but no traffic is coming to your site. What's wrong?
It may not even be the content itself, but rather technical usability factors of the site. The most common technical SEO issues search engine spiders encounter involve crawling the site.
Crawlability issues can sink any SEO effort — Googlebot needs to crawl and index your site properly for your web pages to rank in the search results, after all.
In addition to your site not being crawled by the search engine, it's likely that those technical SEO issues also affect user experience. For instance, if the spiders can't follow your website path, neither will your users.
Not to mention, it's important that your site can efficiently be crawled to optimize crawl budget.
Crawls can detect potential crawlability issues which lets you get ahead of them so there are no problems with search engines reading and indexing your content. We recommend you run two types of crawls using a crawler tool:
1. A crawl of the site that starts from the home page. Let the crawler loose on the site to mimic Google's web crawler (Googlebot).
2. A crawl of landing pages for SEO, ideally aligned with the XML sitemaps.
The data from these crawls will help diagnose crawl problems and clue you in on if your pages are in fact crawlable.
More insights will come from more crawls with further variables such as setting the user agent to Googlebot, a mobile device to see the mobile experience, and rendering JavaScript as opposed to just the HTML.
(You can save time by saving these settings and scheduling future, recurring crawls.)
Follow our guide to crawling enterprise sites, or request a free site audit to examine the technical integrity of your site.
A crawl report from an enterprise-level site can return a lot of data. These sites may contain thousands or even millions of pages!
Not all crawl errors carry the same weight, though. We've separated crawl issues into three categories (high-, mid-, and low-priority) so you can follow along and prioritize (and resolve) issues affecting your site's crawlability.
The first thing a bot will look for on your site is your robots.txt file. You can direct Googlebot by specifying “disallow” what pages you don’t want them to crawl.
User-agent: Googlebot
Disallow: /example/
This is most often the cause of a site's crawlability problems. The directives in this file could block Google from crawling your most important pages or vice versa.
How to find:
These could stem from a mistake in regex code or a typo that can cause major problems.
Like being blocked, if Google arrives at the page and encounters these errors, it’s a big problem. A web crawler travels through the web by following links.
Once the crawler hits the 404 or 500 error page, it’s a dead end for the bot. When a bot hits a large number of error pages, it will eventually give up crawling the page, and your site.
How to find:
Look for issues with the tags that are directives to Google: canonical or hreflang to name a few. These tags could be missing, incorrect, or duplicated, potentially confusing crawlers.
How to find:
Note: Platform users can set rules to pull out changes in these elements flagged by "high priority" rules such as "Noindex detected" where there shouldn't be and can have a major impact on the site. This is a great example of how site audit technology can scale SEO tasks.
Recommended Reading: Crawl Depth in SEO: How to Increase Crawl Efficiency
Google’s ability to render JavaScript is improving, and although Progressive Enhancement is still the recommended method (where all the content would appear in the HTML source code) it’s useful to fully render pages the way Google now does when necessary to experience what a searcher would find on the page.
How to find:
Some issues stem from Google or other search engines not knowing which version of the content to index because of a coding setup.
Examples includes pages with many parameters in the URL, session IDs, redundant content elements, and pagination.
How to find:
Once you find these instances on your site find ways to either remove the creation of the pages, adjust Google's access, or check they have the correct tags, such as canonical, noindex, nofollow to make sure they don't interfere with your target landing pages.
Recommended Reading: Technical SEO: Best Practices to Prioritize Your SEO Tasks
How a website interlinks between related posts is important for indexation. A page that is part of a clear website structure and is interlinked within content has little barrier to indexation.
How to find:
Be on the look-out for best practice elements in this step such as no internal 301 redirects, correct pagination, and complete sitemaps.
Recommended Reading: How to Create a Sitemap and Submit to Google
Mobile usability is a key priority area for SEO with the roll-out of Google’s mobile first index. If the site is deemed unusable for mobile devices, Google may drop them in SERP and which will result in lost traffic.
How to find:
If it's confirmed that your site doesn't have issues outlined above but still isn't indexed, you may have "thin content." Google is aware of these pages, it just doesn't believe they are worthwhile to index.
The content on these pages may be boilerplate or be somewhere else on your website, or it's just not unique enough or sees no external signals validating the content's value or authority from news websites or other industry sites, i.e. no links to it.
How to find:
A website free of crawlability issues is a great place to be. Sites that achieve this enjoy relevant traffic from Google and other search engines and focus on bettering a search experience as opposed to fixing problems.
It's not easy, especially if you have limited time to dedicate to these crawability problems. Spotting and fixing these issues can take effort from dozens of people – from a web design team, to developers, content writers, and other stakeholders.
This is why it's important to find the top problems affecting your performance and develop a plan to fix and standards to suppress any in the future.
Learn more about Clarity Audits, our site audit technology that includes a built-in JS and HTML crawler, and how it performs to identify crawlability issues and technical health checks of your site to ensure full site optimization.
Editor's Note: This post was originally published in May 2018 and has been updated for accuracy and comprehensiveness.