Beginners Guide To Robots.txt Files
What is Robots.txt?
Robots.txt is a simple notepad file that is used by website owners to restrict the robots and crawlers to crawl of specific webpages or the whole website for search engines but sometimes for malicious software.
Where it is Located?
Robots.txt is always located in the root directory of your website means if you’ve created the Robots.txt file and placed it in the right path so its URL will look like this: http://www.examplewebste.com/robots.txt
How to Create?
Open your notepad from windows and save that file as named robots.
How to Restrict Robots?
Open your saved Robots.txt file and start writing paths and restrictions as given below:
Above given transactions will restrict crawl your whole website. How? Let’s read further:
“User-agent:” The attribute is used to define the crawlers and robots that will be restricted to crawl your website.
“*” Next to User-agent: there is an asterisk key that was used to restrict all the crawlers on web to crawl this website.
“Disallow:” This attribute is used to define the files and directories of your website that will be restrict from crawlers to crawl.
“/” This forward slash was made to restrict the root directory of your website from the crawlers.
If you want to restrict the specific webpages of your website, you can define them instead of this forward slash. For example, if you want to restrict the fake.html that is under the /shop named directory or path of your website, you’ll need to write as “Disallow: /shop/fake.html” or if you want to restrict the whole /shop directory, you can do this by “Disallow: /shop/” where the next forward slash is restricting the whole sub-directory from search engine robots or crawlers.
Any questions regarding this Beginners Guide to Robots.txt or Advanced Robots.txt will be appreciated.