This page has been robot translated, sorry for typos if any. Original content here.

Robots.txt file

If you have ever been interested in statistics on visits to your site, you should have noticed that periodically it is visited by various search engines. Naturally, this is not people at all, but special programs that are often called "robots". "Robots" browse the site and index the web resource, so that later it can be found using the search engine whose "robot" was engaged in indexing.

All "robots" before indexing a resource look for a file named robots.txt in the root directory of your site. this file contains information about which files the robots can index, but which cannot. This is useful in cases where you are not interested in indexing certain pages, for example, containing "closed" information.

The robots.txt file is forced to own a text file format for Unix. Some editors can convert regular Windows files, sometimes an FCT client can do this. The file consists of records, each of which contains a pair of fields: a line with the name of the client application (user-agent), as well as one or more lines starting with the Disallow directive:
<Field> ":" <value>

The User-agent string contains the name "robot". For example:
User-agent: googlebot

If you access all robots, you can use the wildcard character "*":
User-agent: *

Robot names can be found in access logs to your web server.

Another part of the team consists of Disallow lines. These lines are directives for this "robot". They tell the "robot" which files and / or directories the robot is prohibited from indexing. For example:
Disallow: email.htm

The directive may also have a directory name:
Disallow: / cgi-bin /

Disallow directives may also seem to use wildcards. The standard dictates that the / bob directive prohibits spiders from indexing /bob.html as well as /bob/index.html.

If the Disallow directive becomes empty, this means that the robot can index all files. At least one Disallow directive must be present for each User-agent field so that robots.txt is considered correct. Completely empty robots.txt means that blah blah itself, as if it weren’t in general.