This page has been robot translated, sorry for typos if any. Original content here.

Module mod_rewrite part 2

In the last chapter, we introduced the basics of the mod_rewrite module. In the example that was considered, we used a construct that literally means the following: “If someone tries to access the .htaccess file, an error is reported indicating that access to the file is denied”

This “rule” is global, that is, everyone will receive the specified error message. I recall that mod_rewrite is a module that provides a " rule -based mechanism for dynamically changing the requested URLs."

We can limit the “rule” with various “rule conditions”. The “rule” will be fulfilled only if a number of conditions are met in front of it.

Syntax : condition must precede rule!

Take another example (an entry in the .htaccess file): RewriteEngine on Options +FollowSymlinks RewriteBase / RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon RewriteRule ^.*$ - [F]

The purpose of the first three entries was detailed in the first part of the publication. Their function is to enable the “rewrite engine”, that is, the module itself.

The last two lines prohibit access to the search robot codenamed “EmailSiphon” (meaning the user-agent name). This robot is a mail address collector from various web pages.

Row: RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon

consists of three parts:

Directive (directive): RewriteCond
Verification string:% {HTTP_USER_AGENT}
Sample Terms: ^ EmailSiphon

The check string is a server variable that can be written in the general form: "% {NAME_VARIABLE}".

The sample condition is a regular expression. For a more complete understanding of the topic, you should consider regular expressions as a class.

Regular expressions

Regular expressions are a mechanism for setting a pattern for a string and searching for data that matches this pattern in a given text. In addition, additional functions for working with such expressions allow you to get the data found in the form of an array of strings, make a replacement in the text on the pattern, split the string on the pattern, etc. However, their main function, on which all the others are based, is precisely the search function in the data text corresponding to the pattern (pattern) described in the regular expressions syntax.

Regular expressions are like a small, compact programming language with its own rules.

For example, a regular expression: s/abc/xyz/g

will replace the string “abc” with the string “xyz” in the whole text.

Here is a brief overview of the most important elements with some examples:

. (dot) - text (any character)
| - alternation (i.e. / abc | def /)
* - quantifier (any number is possible)
^ $ - string anchors
s - operator (replace string1 with string2)
g - modifier (search the entire text)

Regular expressions are constructed using these elements and other "ordinary" characters. They are not a separate language, but are used by other means, such as programming languages ​​such as Perl or PHP , or text editors (Emacs).

If we talk about the relationship between regular expressions and the mod_rewrite module, they are used in the RewriteRule and RewriteCond directives.

“^” Indicates the beginning of a line. It follows from this that UserAgent must begin with the string "EmailSiphon" and nothing else ("NewEmailSiphon, for example, would not work).

But, since this regular expression does not contain the character "$" (end-of-line anchor), UserAgent might be, for example, "EmailSiphon2".

The last line of our example: RewriteRule ^.*$ - [F]

determines what needs to be done when the robot requests access.

The regular expression "^. * $" Means: "Access to all files denied."

The dot "." In a regular expression is a meta symbol (wildcard), meaning any random symbol.

“*” Means that a string can occur an unlimited number of times. In this case, regardless of the name of the requested file, an error will be generated.

EmailSiphon is certainly not the only mail collector. Another well-known member of this family is “ExtractorPro”. Suppose we want to deny access to this robot. In this case, we need another condition.

Now the .htaccess file will look like this: RewriteEngine on Options +FollowSymlinks RewriteBase / RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro RewriteRule ^.*$ - [F]

The third argument [OR] (in the first line of RewriteCond) is called the “flag”. There are two possible flags:

NC - ignore case of letters.
OR - means "or the following condition."

The checkbox NC allows you to ignore the case of letters in the desired pattern. For example: RewriteCond %{HTTP_USER_AGENT} ^emailsiphon [NC]

This line specifies that both emailsiphon and EmailSiphon will be recognized as identical expressions.

You can use several flags at once, separated by commas. RewriteCond % {HTTP_USER_AGENT} ^EmailSiphon [NC, OR] RewriteCond % {HTTP_USER_AGENT} ^ExtractorPro

There are no restrictions on the number of conditions. Thus, you can block 10, 100, 1000 or more well-known mail collectors. The definition of these 1000 conditions is simply a matter of server load and transparency of the .htaccess file.

In the above example, the global variable "HTTP_USER_AGENT" is used. There are also other variables:

REMOTE_HOST
REMOTE_ADDR

For example, if you want to block a spider coming from www.site.ru, you can use the global variable “REMOTE_HOST” in this way: RewriteCond % {REMOTE_HOST} ^www.site.ru$ RewriteRule ^.*$ - [F]

If you want to block a specific IP address, the condition will look like this: RewriteCond % {REMOTE_ADDR} ^212.37.64.10$ RewriteRule ^.*$ - [F]

In the regular expression for checking the exact and complete IP address you need to use the starting and ending anchors.

You can also exclude the whole range: RewriteCond %{REMOTE_ADDR} ^212.37.64. RewriteRule ^.*$ - [F] RewriteCond %{REMOTE_ADDR} ^212.37.64. RewriteRule ^.*$ - [F]

This example shows how to block the range of IP addresses from 212.37.64.0 to 212.37.64.255.

But a small task to test acquired knowledge (the solution will be given in the next part): RewriteCond %{REMOTE_ADDR} ^212.37.64 RewriteRule ^.*$ - [F]

Attention, question!

If we write in the regular expression "^ 212.37.64" instead of "^ 212.37.64." (With a dot at the end), does it have the same effect and will the same IP addresses be excluded?

So far we have used a simple rewriteRule that generates an error message. In the third part of the publication, we will analyze how you can use the RewriteRule to redirect visitors to specific files.