robots.txt

USENIT November 2017robots.txt

Presentation of the robots.txt file

The file robots.txt is a text file used for SEO websites, containing commands to crawlers of search engines to determine their pages that may or may not be indexed. Thus any search engine begins crawling a website by searching the file robots.txt at the root of the site.

robots.txt file format

The file robots.txt (Written in lower case and plural) is an ASCII file located in the root of the site and may contain the following commands:

  • User-Agent: clarifies the robot concerned with the following guidelines. The value * means "all search engines".
  • Disallow: to specify which pages to exclude from indexing. Each page or path to be excluded must be on a separate line and must begin with /. The value / only means "all pages of the site".




Here are sample files robots.txt :

  • Exclusion of all pages:

User-Agent: *
Disallow: /
  • Exclusion of any page (equivalent to the absence of file robots.txt, All pages are visited):

User-Agent: *
Disallow:
  • Authorization of a single robot:

User-Agent: nomDuRobot
Disallow:
User-Agent: *
Disallow: /
  • Excluding a robot:

User-Agent: NomDuRobot
Disallow: /
User-Agent: *
Disallow:
  • Excluding a page:

User-Agent: *
Disallow: /repertoire/chemin/page.html
  • Exclusion of several page:

User-Agent: *
Disallow: /repertoire/chemin/page.html
Disallow: /repertoire/chemin/page2.html
Disallow: /repertoire/chemin/page3.html
  • Exclusion of all pages of a directory and its subdirectories:

User-Agent: *
Disallow: / directory /

Some User-Agents

Examples of User Agents for the most popular search engines:


Name moteurUser-Agent
Alta VistaScooter
ExcitedArchitextSpider
GoogleGooglebot
HotBotSlurp
InfoSeekInfoSeek Sidewinder
LycosT-Rex
hereEcho

For more information

  • Improve the crawlers crawl
  • The web robots page

To go deeper

  • SEO training

See also


Download this article (PDF)
download this article (PDF