Beginners Guide To
Robots.txt Files
What is Robots.txt?
Robots.txt is a simple notepad file that is used by website
owners to restrict the robots and crawlers to crawl of specific webpages or the
whole website for search engines but sometimes for malicious software.
Where it is Located?
Robots.txt is always located in the root directory of your
website means if you’ve created the Robots.txt file and placed it in the right
path so its URL will look like this: http://www.examplewebste.com/robots.txt
How to Create?
Open your notepad from windows and save that file as named robots.
How to Restrict
Robots?
Open your saved Robots.txt file and start writing paths and
restrictions as given below:
User-agent: *
Disallow: /
|
Above given transactions will restrict crawl your whole
website. How? Let’s read further:
“User-agent:” The
attribute is used to define the crawlers and robots that will be restricted to
crawl your website.
“*” Next to
User-agent: there is an asterisk key that was used to restrict all the crawlers
on web to crawl this website.
“Disallow:” This
attribute is used to define the files and directories of your website that will
be restrict from crawlers to crawl.
“/” This forward
slash was made to restrict the root directory of your website from the
crawlers.
If you want to restrict the specific webpages of your
website, you can define them instead of this forward slash. For example, if you
want to restrict the fake.html that is under the /shop named directory or path
of your website, you’ll need to write as “Disallow:
/shop/fake.html” or if you want to restrict the whole /shop directory, you
can do this by “Disallow: /shop/” where
the next forward slash is restricting the whole sub-directory from search engine robots or crawlers.
Any questions regarding this Beginners Guide to Robots.txt or Advanced Robots.txt will be appreciated.
No comments:
Post a Comment