Robots Robots.txt file is used by site owners to stop the search engines from crawling certain part of your blog or website like the admin dashboard, admin folder and the rest. A Robots.txt always reside at the root of all domains. For instance your domain.com/Robot.txt
If the search engine bots visits a blog or website and they have limited resources to crawl the blog or website, they may stop crawling, and this Robots.txtcould affect your blog or website indexing. Also, there are many parts of your blog or website, that you may not like the search engine to crawl and when you use robots.txt, you are directing search engine crawlers not to crawl such certain area of your blog or website. Note that this action would speed up the crawling process of your blog or website and also help the deep crawling of your inner pages.
To check your domain robots.txt file go to www.domain.com/robots.txt. In most blogging platforms you may see a blank robots.txt file but you can also check your domain Robots.txt file using Google webmaster tools.
To do this Google webmaster tool
Go to site configuration
Click on Crawler Access
To avoid duplicate content the structure should be like this;
User-agent: *
Disallow: /wp-
Disallow: /trackback/
The robot.txt file is only going to stop the search engines from crawling your blogs or websites but not the indexing parts. It will only prevent aspects like your feeds, trackbacks, admin folder as well as your pages and comments from being crawled. If you have some pages or posts you do not want the search engines to index, you can make use of the Yoast SEO plugin to add the Noindex tags to the pages or posts you do not want the search engines to index on your blog or website.
Hope this help? Do share!
Leave a Reply