Robots.txt is a text file that webmasters create and implement on their website to inform search engine crawlers (also known as bots) which pages or files on the website the crawler can or cannot crawl.
A robots.txt file is not a way to completely keep a webpage out of the Google index. (If that is your objective, use noindex directives instead.)
The file is a part of the REP, or robots exclusion protocol. These protocols determine how robots on the internet crawl and index content.
User-agent: *
Allow: /
Sitemap: http://www.example.com/sitemap.xml
In the example above, all user agents can access the site in its entirety.
User-agent: *
Disallow: /
In the example above, all user agents cannot access the site.
Below you can see various examples of the robots.txt file as explained by Google.
Most robots.txt files can be viewed by going to www.yoursite.com/robots.txt.
Here is how that looks for seoClarity’s domain:
There are various specifications to the robots.txt file that you should be aware of. Knowing this will help you better understand the creation of a robots.txt file.
When creating the file with your text editor of choice (not a word processor), ensure that it is able to create UTF-8 text files.
Then, follow these recommendations so you can properly implement your file:
Before you get started creating your robots.txt file, there are a few terms you’ll need to be familiar with.
Although they’re extremely helpful files, there are some common elements that can go awry.
We’ve compiled a list of common robots.txt issues to help you better understand the nuances of the file and prevent any avoidable mistakes.