This page has been robot translated, sorry for typos if any. Original content here.

Robots.txt file

If you have ever been interested in statistics of visits to your site, then you should have noticed that from time to time various search engines visit it. Naturally, it does not act people, but special programs, which are often called "robots." The "robots" browsing the site also index the web resource, so that it could be allowed to find it with the help of the search engine whose "robot" was engaged in indexing.

All "robots" before indexing a resource are looking for a file in the root directory of your site with the name robots.txt. This file contains information on which robots files can be indexed, but which ones are not. This is useful in cases when you do not want some pages to be indexed, for example, containing "closed" information.

The robots.txt file is forced to own a text file format for Unix. Some editors are able to convert ordinary Windows files, sometimes the FCT client can do this. The file consists of records, each of which contains a pair of fields: a line with the name of the client application (user-agent), as well as one or several lines starting with the Disallow directive:
<Field> ":" <value>

The string User-agent contains the name "robot". For example:
User-agent: googlebot

If you access all robots, you can use the wildcard "*":
User-agent: *

The names of the robots are allowed to be found in the logs of access to your web server.

Another part of the team consists of Disallow lines. These strings are directives for this "robot". They tell the robot what files and / or directories the robot is prohibited to index. For example:
Disallow: email.htm

The directive may also have a directory name:
Disallow: / cgi-bin /

In Disallow directives, wildcards may also appear to be used. The standard dictates that the / bob directive will prohibit spiders from also indexing /bob.html, also /bob/index.html.

If the Disallow directive becomes empty, it means that the robot can index all files. At least one Disallow directive must be present for each User-agent field, so that robots.txt is considered valid. Fully empty robots.txt means that blah blah, as if it would not be common.