23
May
2009
Allow/Disallow Search Engines Bots Using Robots.txt [With Examples]
By admin. Posted in SEO
0 Comments » | Share |
The robots.txt file is a text file containing commands to the engine crawlers research to clarify their pages who may or may not be indexed.
Thus any search engine began its exploration of a website seeking robots.txt at the root of the site.
Format robots.txt :
The robots.txt (written in lower case and plural) is an ASCII file that are at the root of the site and may contain the following commands:
- User-Agent: (value)
Allows you to specify the robot affected by the following guidelines.
(value) can be * meaning “all search engines”, Googlebot for google search engine bot, Yahoo-slurp for Yahoo search engine bot, Msnbot for Msn search engine bot, etc for other specific search engine bots which follow robots.txt standards. - Allow: (value)
Allows you to specify the pages to include for indexing. - Disallow: (value)
Allows you to specify the pages to exclude from indexing. Each page or path to exclude must be on a line at hand and must begin with the value / sole means “all pages.”
Note: The robots.txt file should contain no blank line!
Examples of robots.txt:
- Exclusion of all pages:
User-Agent: *
Disallow: /
- Exclusion of any page (equivalent to the absence of robots.txt, all pages are visited):
User-Agent: *
Disallow:
- Authorization of a single robot: For example Google bot
User-Agent: Googlebot
Disallow:
User-Agent: *
Disallow: /
- Exclusion of a robot: For example MSN bot
User-Agent: Msnbot
Disallow: /
User-Agent: *
Disallow:
- Excluding one-page:
User-Agent: *
Disallow: /directory/path/page.html
- Exclusion of several page:
User-Agent: *
Disallow: /directory/path/page.html
Disallow: /wp-admin/admin/page2.html
Disallow: /wp-admin/settings/page3.html
- Exclusion of all pages of a directory and its subfolders:
User-Agent: *
Disallow: /directory/
- Allow only Google, Yahoo, Msn Bots only and Disallow others
User-agent: *
Disallow: /User-agent: Googlebot
Allow: /User-agent: Yahoo-slurp
Allow: /User-agent: Msnbot
Allow: /
- Compact Version (I found this one on some forum, So not completely sure about this one)
User-agent: Googlebot
User-agent: Slurp
User-agent: msnbot
User-agent: Mediapartners-Google*
User-agent: Googlebot-Image
User-agent: Yahoo-MMCrawler
Disallow:User-agent: *
Disallow: /





