Module mod_rewrite part 2

In the last chapter we got acquainted with the basics of mod_rewrite module. In the example that was considered, we used a construct that literally means the following: "If someone tries to access the .htaccess file, an error is issued indicating that access to the file is prohibited."

This "rule" is global, that is, each receives the specified error message. I'll remind you that mod_rewrite is a module that provides a " rule- based mechanism for dynamically changing the requested URLs."

We can limit the "rule" with the help of various "rule conditions". The "rule" will be fulfilled only if a number of conditions are met before it.

Syntax : the condition must precede the rule!

Let's take another example (entry in the .htaccess file): RewriteEngine on Options +FollowSymlinks RewriteBase / RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon RewriteRule ^.*$ - [F]

The appointment of the first three records was discussed in detail in the first part of the publication. Their function is to enable the "rewriting engine", that is, the module itself.

The last two lines prohibit access to the search robot, code-named "EmailSiphon" (meaning the name of the user agent). This robot is the collector of email addresses from various web pages.

Row: RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon

Consists of three parts:

Directive (designation): RewriteCond
Verification string:% {HTTP_USER_AGENT}
Sample condition: ^ EmailSiphon

The verification string is a server variable that can be written in the general form: "% {APPLICATION_NAME}".

A condition sample is a regular expression. For a more complete understanding of the topic, consider regular expressions as a class.

Regular expressions

Regular expressions are a mechanism that allows you to specify a template for a string and search for data that matches that pattern in the specified text. In addition, additional functions to work with such expressions allow you to get the data found in the form of an array of strings, to make a replacement in the text from the template, splitting the line by template, and so on. However, their main function, on which all the others are based, is precisely the function of searching in the text for data corresponding to the pattern described in the syntax of regular expressions.

Regular expressions are similar to a small, compact programming language with its own rules.

For example, the regular expression: s/abc/xyz/g

Will replace the string "abc", by the line "xyz" in the entire text.

Here is a brief overview of the most important elements with some examples:

. (Dot) - text (any character)
| | | - alternation (i.e., / abc | def /)
* - quantifier (any number is possible)
^ $ - anchor line
S - operator (replace string1 with string2)
G - modifier (search all the text)

Regular expressions are constructed using these elements and other "ordinary" symbols. They are not a separate language, but are used by other means, for example programming languages ​​such as Perl or PHP , as well as text editors (Emacs).

If we talk about the relationship between regular expressions and the module mod_rewrite, then they are used in the directives RewriteRule and RewriteCond.

"^" Denotes the beginning of the line. It follows that the UserAgent should start with the "EmailSiphon" line and with nothing else ("NewEmailSiphon", for example, would not work).

But, since this regular expression does not contain a "$" character (the end of line anchor), UserAgent could be, for example, "EmailSiphon2".

The last line of our example: RewriteRule ^.*$ - [F]

Determines what to do when the robot asks for access.

The regular expression "^. * $" Means: "Access to all files is prohibited".

The dot "." In the regular expression is a meta character (wildcard), meaning any random character.

"*" Means that the string can occur unlimited times. In this case, regardless of the name of the requested file, an error will be issued.

"EmailSiphon", of course, is not the only mail collector. Another well-known member of this family is "ExtractorPro". Suppose we want to deny access to this robot. In this case, we need one more condition.

Now the .htaccess file will look like this: RewriteEngine on Options +FollowSymlinks RewriteBase / RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro RewriteRule ^.*$ - [F]

The third argument [OR] (in the first line of RewriteCond) is called a "flag". There are two possible flags:

NC - not case sensitive.
OR means "or the following condition".

The NC check box allows you to ignore the case of letters in the sample. For example: RewriteCond %{HTTP_USER_AGENT} ^emailsiphon [NC]

This line specifies that both "emailsiphon" and "EmailSiphon" will be recognized as identical expressions.

You can use several checkboxes at the same time, separating them with commas. RewriteCond % {HTTP_USER_AGENT} ^EmailSiphon [NC, OR] RewriteCond % {HTTP_USER_AGENT} ^ExtractorPro

There are no restrictions on the number of conditions. Thus, you can block 10, 100, 1000 or more known mail collectors. Determining these 1000 conditions is simply a matter of server loading and transparency of the ".htaccess" file.

In the above example, the global variable "HTTP_USER_AGENT" is used. There are also other variables:

REMOTE_HOST
REMOTE_ADDR

For example, if you want to block a spider from www.site.ru, you can use the global variable "REMOTE_HOST" in this way: RewriteCond % {REMOTE_HOST} ^www.site.ru$ RewriteRule ^.*$ - [F]

If you want to block a specific IP address, the condition will look like this: RewriteCond % {REMOTE_ADDR} ^212.37.64.10$ RewriteRule ^.*$ - [F]

In the regular expression for verifying the exact and full IP address, you need to use the initial and final anchors.

You can also exclude the whole range: RewriteCond %{REMOTE_ADDR} ^212.37.64. RewriteRule ^.*$ - [F] RewriteCond %{REMOTE_ADDR} ^212.37.64. RewriteRule ^.*$ - [F]

This example shows how you can block the range of IP addresses from 212.37.64.0 to 212.37.64.255.

And here is a small puzzle for testing the acquired knowledge (the solution will be given in the next part): RewriteCond %{REMOTE_ADDR} ^212.37.64 RewriteRule ^.*$ - [F]

Attention, a question!

If we write in the regular expression "^ 212.37.64" instead of "^ 212.37.64." (With a dot at the end), will this give the same effect, and will the same IP addresses be excluded?

So far we have used a simple RewriteRule, which generates an error message. In the third part of the publication, we will analyze how you can use RewriteRule to redirect visitors to specific files.