How mod_rewrite actually works. Allowance for continuing

Как на самом деле работает mod_rewrite. Пособие для продолжающих

This article grew out of the idea of ​​advanced training of our technical support staff working with mod_rewrite. Practice has shown that after studying the available in a large number of textbooks in Russian, the support is well provided with the solution of the template tasks, but here the independent compilation of the rules takes place by trial and error. The problem is that for a good understanding of the work of mod_rewrite it is necessary to study the original English-language documentation, after that - additional explanations, or hours of experiments with RewriteLog.

The article describes the mechanism of mod_rewrite operation. Understanding the principles of its work allows you to clearly understand the operation of each directive and clearly imagine what happens at some point inside mod_rewrite when processing directives.

I assume that the reader is already familiar with what mod_rewrite is, and I will not describe its fundamentals, which are easy to find on the Internet. It should also be noted that the article highlights the work of mod_rewrite when using its directives in the .htaccess file. Differences in working in context Are stated at the end of the article.

So, you studied mod_rewrite, made up several RewriteRules, and managed to encounter endless redirects, with the case when the rule for some reason does not catch your request, as well as with the unpredictable work of the rule group, when the subsequent rule unexpectedly changes the query painstakingly prepared by the previous rules.

What does RewriteRule work with?

The first RewriteRule is passed the path from the place where .htaccess is located to the requested file. This line never starts with "/". Subsequent RewriteRule is passed the result of previous transformations.

In order to fully understand how RewriteRule works, you must first determine what it's working with . Consider how Apache gets a string that is initially passed to RewriteRule for processing in .htaccess.

When you first start working with mod_rewrite, it's logical to assume that it works with links. However, in the case of using mod_rewrite in .htaccess, this is not the case. In fact, RewriteRule does not pass the link, but the path to the requested file .

Because of the internal architecture of Apache at the moment when .htaccess comes into action, mod_rewrite can only handle the path to the file that needs to be processed. This is due to the fact that before the transfer to mod_rewrite the request could already be changed by other modules (for example, mod_alias), and the final path to the file on the site may already not coincide with the original link. If mod_rewrite worked with the source link, it would violate the action of the modules that changed the request before it.

Therefore, mod_rewrite is passed the absolute path to the file, which must be processed. Also mod_rewrite knows the path to .htaccess, in which the rules of RewriteRule are placed. To make from the path to the file something similar to the link with which the developer of the site plans to work, mod_rewrite cuts off the absolute path to the .htaccess file.

So, it is this path, from which the path to .htaccess is cut off, is passed to the first RewriteRule. For example:

  • Request: http://example.com/templates/silver/images/logo.gif
  • DocumentRoot: /var/www/example.com
  • The path to the file: /var/www/example.com/templates/silver/images/logo.gif
  • .htaccess is located at: /var/www/example.com/templates/.htaccess
  • The first RewriteRule will be transferred: silver / images / logo.gif
  • Note: "templates /" was also cut off.
Как на самом деле работает mod_rewrite. Пособие для продолжающих

The path to .htaccess is cut off with the slash. From this there is an effect: a string that is initially passed to the RewriteRule processing never starts with "/".

It's important to remember what RewriteRule does . It does not process the site name, the arguments that are passed to the script, and the link does not handle all, if .htaccess is not located in the root of the site. All this is done by RewriteCond, which will be briefly discussed a little later. So:

# работать не будет - правило начинается со /
RewriteRule ^/index.php$ /my-index.php

# работать не будет - название сайта не анализируется RewriteRule
RewriteRule ^example.com/.* http://www.example.com

# работать не будет - аргументы ссылки не попадают в RewriteRule
RewriteRule index.php\?newspage=([0-9]+) news.php?page=$1
# работать не будет - правило начинается со /
RewriteRule ^/index.php$ /my-index.php

# работать не будет - название сайта не анализируется RewriteRule
RewriteRule ^example.com/.* http://www.example.com

# работать не будет - аргументы ссылки не попадают в RewriteRule
RewriteRule index.php\?newspage=([0-9]+) news.php?page=$1
# Будет работать только если .htaccess находится там же, где находится папка templates,
# например, в корне сайта. То есть, если .htaccess находится в templates/.htaccess , правило
# работать НЕ БУДЕТ, потому что mod_rewrite отрежет путь до .htaccess и на вход RewriteRule
# строка попадет уже без "templates/"
RewriteRule ^templates/common/yandex-money.gif$ templates/shared/yad.gif
# Будет работать только если .htaccess находится там же, где находится папка templates,
# например, в корне сайта. То есть, если .htaccess находится в templates/.htaccess , правило
# работать НЕ БУДЕТ, потому что mod_rewrite отрежет путь до .htaccess и на вход RewriteRule
# строка попадет уже без "templates/"
RewriteRule ^templates/common/yandex-money.gif$ templates/shared/yad.gif

At the beginning of using mod_rewrite, I recommend working with it only in .htaccess in the root of the site. This will somewhat simplify the control over its operation.

With what works RewriteRule, we figured it out. Now let's see how it works .

How RewriteRule Works

RewriteRule simply converts the string to match regular expressions, and that's it. RewriteRule works with a string, not with a link or path to the file.

As we found out above, the path from .htaccess to the requested file enters the RewriteRule input. It is most convenient now to abstract from paths and references and to consider what RewriteRule works with as a normal line. This line is passed from RewriteRule to RewriteRule, modifying if any of the RewriteRules has worked.

In general, if you exclude the difficulties with the use of flags (some of which we will discuss below) and the difficulty with the creation of regular expressions (which we will hardly touch on in this article), RewriteRule works VERY simple.

  1. We took a line.
  2. Compared with the regular expression in the first argument.
  3. If there is a match, replace the entire string with the value of the second argument.
  4. The line was transferred to the next RewriteRule.

That's all in all. To illustrate that RewriteRule works with a string, consider the following fantastic example:

# Запрос: http://mysite.com/info.html
# В первый RewriteRule попадет "info.html"

# Преобразовываем запрос в произвольную строку.
RewriteRule ^info.html$ "I saw a turtle in the hole. And it was dancing rock-n-roll. And it was smiling. All in all, it was a very funny doll."

# "info.html" -> "I saw a turtle..."

# Заменяем эту строку на внешнюю ссылку.
RewriteRule turtle https://example.com/information/index.html

# "I saw a turtle..." -> "https://example.com/information/index.html"

# Заменяем имя сайта!
RewriteRule ^(.*)example.com(.*)$ $1example.org$2

# "https://example.com/information/index.html" -> "https://example.org/information/index.html"

# Заменяем протокол!
RewriteRule ^https :( .*)$ ftp:$1

# "https://example.org/information/index.html" -> "ftp://example.org/information/index.html"

# Заменяем конечную ссылку.
RewriteRule ^(.*)/index.html$ $1/main.php

# "ftp://example.org/information/index.html" -> "ftp://example.org/information/main.php"

As you can see, RewriteRule does not care what to work with - it just converts the string according to the arguments it specifies. If you want, you can store any data arrays in the line, if you want, perseverance and good knowledge of regular expressions, you can at least write crosses and nicks on RewriteRule.

Here you need to make a note: although RewriteRule works with a clean string, it is still oriented to work with links. Therefore, it will react in a special way to the lines beginning with "https: //" or analogues (remember that we wanted to do an external redirect) and the "?" (Will count the following characters as arguments that need to be substituted for the query). However, now it does not interest us - it is important to understand that there is no magic in RewriteRule - it simply takes a line and changes it the way you told it. External redirects and arguments we will consider later in the article, there too, there is something to talk about.

After all the transformations are done and the last RewriteRule is executed, RewriteBase takes effect.

What is RewriteBase for?

If the resulting query is relative and different from the original query, RewriteBase adds itself to it on the left. It is necessary to specify RewriteBase in .htaccess. Its value is the path from the site root to .htaccess. RewriteBase runs only after all RewriteRules, not between them.

We already mentioned above that mod_rewrite, working in .htaccess, gets the absolute path to the requested file. To pass it to RewriteRule, mod_rewrite cuts the path to .htaccess. Then the rules of RewriteRule one after another change the request successively. And after the request is changed, Apache must restore the absolute path to the file, which it must eventually process. RewriteBase is actually a hack that helps restore the source path to a file.

RewriteBase is executed after all transformations. This means that it will not change the request between RewriteRule, and will take effect only when all RewriteRules work out.

After all the conversions, RewriteBase looks, the relative result is a total or absolute path. In the context of Apache, we mean a relative or absolute path, counting from the root of the site:

  • Images / logo.gif - relative.
  • /images/logo.gif - absolute (at the beginning of the slash).
  • Http://example.com/images/logo.gif - the most absolute of all.

If the path is absolute, RewriteBase does nothing. And if relative - RewriteBase appends itself to the left. This works for both internal and external redirects:

# .htaccess находится в /images/
# RewriteBase указан /images/
RewriteBase /images/

# Запрос http://example.com/images/logo.gif
# На вход RewriteRule попадает "logo.gif"
RewriteRule ^logo.gif$ logo-orange.gif
# После RewriteRule: "logo.gif" -> "logo-orange.gif"
# После RewriteBase: "logo-orange.gif" -> "/images/logo-orange.gif"

# Запрос http://example.com/images/header.png
# На вход RewriteRule попадает "header.png"
RewriteRule ^header.png$ /templates/rebranding/header.png
# После RewriteRule: "header.png" -> "/templates/rebranding/header.png"
# После RewriteBase: ничего не меняется, так итоговый результат преобразований начинается со "/'.

# Запрос http://example.com/images/director.tiff
# На вход RewriteRule попадает "director.tiff"
# Используем внешний относительный редирект
RewriteRule ^director.tiff$ staff/manager/director.tiff [R=301]
# После RewriteRule: "director.tiff" -> "staff/manager/director.tiff"
# + mod_rewrite запомнил, что будет внешний редирект
# После RewriteBase: "staff/manager/director.tiff" -> "/images/staff/manager/director.tiff"
# mod_rewrite вспомнил про внешний редирект:
# "/images/staff/manager/director.tiff" -> http://example.com/images/staff/manager/director.tiff

Usually, after some acquaintance with mod_rewrite, the following habit develops: 1) add "RewriteBase /" to each .htaccess, 2) all redirects start with a slash: "RewriteRule news.php /index.php?act=news". This helps to get rid of the artifacts of RewriteBase, but it's wrong to do so. Now that we know what RewriteBase does, we can formulate the following correct rules:

  1. RewriteBase must match the path from the site root to .htaccess.
  2. You should start redirection from "/" only if you need to specify an absolute path from the root of the site to the file.
Как на самом деле работает mod_rewrite. Пособие для продолжающих

What happens if I do not specify RewriteBase? By default, Apache makes it equal to the absolute path on the file system before .htaccess (for example, /var/www/example.com/templates/). The incorrectness of such an assumption Apache manifests itself on external relative redirects:

# Запрос http://example.com/index.php
# DocumentRoot: /var/www/example.com/
# .htaccess находится в корне сайта, и в нем НЕ УКАЗАН RewriteBase.
# Поэтому по умолчанию RewriteBase равен абсолютному пути до .htaccess: /var/www/example.com/

# На входе RewriteRule - "index.php"
RewriteRule ^index.php main.php [R]
# На выходе: "index.php" -> "main.php"
# mod_rewrite запомнил, что нужен внешний редирект

# Закончились RewriteRule
# mod_rewrite все равно выполняет RewriteBase, так как у него есть значение по умолчанию.
# Получается: "main.php" -> "/var/www/example.com/main.php"

# Здесь mod_rewrite вспоминает, что был внешний редирект:
# "/var/www/example.com/main.php" -> http://example.com/var/www/example.com/main.php

# Получилось совсем не то, что имели в виду.

So, the request went through all the RewriteRules, after which RewriteBase was added to it, if necessary. Should Apache now give the file to which the resulting path shows? No. Now the resulting query will be processed again.

How does mod_rewrite work? Flag [L]

Mod_rewrite starts the query processing again and again, until it stops changing. And the [L] flag can not stop it.

When composing more or less complex configurations of mod_rewrite, it is important to understand that changing the query does not end with the last RewriteRule . After the last rule of RewriteRule has been triggered and RewriteBase has been added, mod_rewrite looks whether the request has changed or not. If the request has changed, its processing starts anew from the beginning of .htaccess.

Apache does this because it could be redirected to another directory during the modification of the request. It can have its own .htaccess, which did not participate in the previous query processing. In the same new .htaccess, there may be rules that affect the processing of the request - both the mod_rewrite rules and the rules of other modules. To properly handle this situation, Apache must run the entire processing cycle again.

- Wait, but there is a flag [L] , which stops the request processing mod_rewrite'om!

Not certainly in that way. The [L] flag stops the current iteration of the request processing. However, if the request was changed by those RewriteRules that still managed to work, Apache will start the cycle of processing the request again from the first RewriteRule.

# Запрос: http://example.com/a.html

RewriteBase /

RewriteRule ^a.html$ b.html [L]
RewriteRule ^b.html$ a.html [L]

The example above will lead to an infinite redirect loop and to the "Internal Server Error" in the end. In this example, an infinite loop is obvious, but in more complex configurations it may be necessary to dig into the rules to determine which requests are looped to each other.

To avoid such situations, it is recommended to use the [L] flag only if necessary. Necessity can be of two types:

  1. When an external redirect is used - [L, R = 301] or [L, R = 302]. In case of an external redirect, further processing of the request is undesirable (see below about the flag [R]), and it will be better stopped
  2. When in .htaccess there is a loop that can not be disposed of, and the request for mod_rewrite should be forced to terminate. In this case, a special construction is used - see the tips on this topic at the end of the article.

But the example below will not get stuck. Try to determine why, and which file will eventually be given to Apache.

# Запрос: http://example.com/a.html
# Начало .htaccess

RewriteBase /
RewriteRule ^a.html$ b.html
RewriteRule ^b.html$ a.html

# Конец .htaccess

How does mod_rewrite work? Flag [R]

The [R] flag does not stop processing the request, returning immediately the external redirect. Instead, it remembers the need for an external redirect, and the processing of the request continues with the following RewriteRule. It is recommended that you always use the [L] flag.

The [R] flag tells Apache that it is not an internal, but an external redirect, that needs to be executed. What is the difference between an external redirect and an internal one? An internal redirect simply changes the path to the file that will be given to the user, while the user thinks that he gets the file that he originally requested. In the case of an external redirect, Apache returns the status of the response 301 or 302 instead of the contents of the file and reports the link by which the browser should contact to receive the file.

It would seem that when processing the [R] flag Apache should immediately stop RewriteRule processing and return the external redirect to the user. However, let's recall a fantastic example from the section "How RewriteRule works." In it, we first specified the [R] flag, indicating the need for an external redirect, and then continued to modify the link with the following RewriteRule.

This is how Apache works when it specifies an external redirect. It simply "marks" to itself that after all the rules are fulfilled, it is necessary to return the status of 302 (by default), but at the same time it continues executing all RewriteRules further down the list. We can continue to modify the query as we need, the only thing that does not work out is to make the redirect back internal.

Nevertheless, it is unlikely that you want to change it in any way after the external redirect is given. Therefore, when using the [R] flag, it is recommended to specify it together with [L] :

# BlackJack переехал на красивое имя
RewriteRule ^bj/(.*) blackjack/$1 [R=301,L]

# Можно использовать просто внешнюю ссылку
RewriteRule ^bj/(.*) http://blackjack.example.com/$1 [L]

Instead of using the [R] flag, you can simply specify an external reference. In this case, Apache will guess that it is necessary to do an external redirect. Here, as with the explicit indication of the flag [R], it is recommended to use the flag [L].

  • If an external redirect leads to the same site, it is better to use the [R] flag without specifying a full reference (in other words, use a relative external redirect). This will make the rule independent of the site name.
  • If an external redirect leads to another site, otherwise, as indicating the full external link, this will not work.

How does mod_rewrite work? Specifying query parameters and the [QSA] flag

Changing query parameters in RewriteRule does not change the string that the next RewriteRule is running with. However, changing the parameters changes the variable% {QUERY_STRING}, with which RewriteCond can work.

Used terminology: "parameters" - query parameters, "arguments" - RewriteRule arguments.

Using RewriteRule, you can change not only the path to the file that will be processed, but also the parameters of the GET request that will be passed to it. This is often used to transfer NC processing to a common script-handler, for example:

RewriteBase /

# Запрос: http://example.com/news/2010/07/12/grand-opening.html
# На входе: "news/2010/07/12/grand-opening.html"
RewriteRule ^news/(.*)$ index.php?act=news&what=$1
# После RewriteRule: "news/2010/07/12/grand-opening.html" -> "index.php"
# %{QUERY_STRING}: "" -> "act=news&what=2010/07/12/grand-opening.html"

At the moment when the RewriteRule rule encounters a question mark in the second argument, it understands that there is a change in the parameters in the request. As a result, the following happens:

  1. RewriteRule replaces the string with which it works, by part of the second argument before the question mark . Note that the new query parameters do not fall into the line from which the subsequent RewriteRule rules will work.
  2. Part of the second argument after the question mark falls into the variable% {QUERY_STRING}. If the [QSA] flag was specified, the query parameters will be added to the beginning of% {QUERY_STRING}. If the flag was not specified,% {QUERY_STRING} will be completely replaced by the query parameters from the RewriteRule.

Another couple of examples:

RewriteBase /

# Запрос: http://example.com/news/2010/?page=2
# На входе RewriteRule: "news/2010/"
RewriteRule ^news/(.*)$ index.php?act=news&what=$1
# После преобразования: "news/2010/" -> "index.php"
# Значение %{QUERY_STRING}: "page=2" -> "act=news&what=2010/"

Most likely, the rule above does not work correctly, because the page argument is lost. Let's fix this:

RewriteBase /

# Запрос: http://example.com/news/2010/?page=2
# На входе RewriteRule: "news/2010/"
RewriteRule ^news/(.*)$ index.php?act=news&what=$1 [QSA]
# После преобразования: "news/2010/" -> "index.php"
# Значение %{QUERY_STRING}: "page=2" -> "act=news&what=2010/&page=2"

We added only the [QSA] flag, and the rule started working correctly.

You can understand that changing the query parameters changes% {QUERY_STRING} , which can be used later in RewriteCond. This should be taken into account when drawing up the subsequent rules that test the arguments.

- Of course, it changes, because the request goes to re-processing Apache'm!

No,% {QUERY_STRING} changes immediately . I will not give the proof - about the parameters and so already written more than interesting to read :)

What can I do to check in RewriteCond exactly those parameters of the request that the user sent, rather than modified by RewriteRules? See the tips at the end of the article.

RewriteCond and Performance

First, the query is matched against RewriteRule, and only then is the additional RewriteCond condition.

A couple of words should be said about the order in which mod_rewrite executes the directives. Since in .htaccess first go RewriteCond, and then RewriteRule, it seems that mod_rewrite first checks all the conditions, and then starts executing RewriteRule.

In fact, everything happens the other way around. First mod_rewrite checks if the current value of the request matches the regular expression RewriteRule, and only then will it check all the conditions listed in RewriteCond.

So if you have a regular expression for two pages in RewriteRule and you, having thought about the performance, decided to limit the execution of this rule to additional RewriteCond, you know, nothing will happen. In this case, it's better to use the RewriteRule [C] or [S] flags to skip a more complex rule if simpler checks did not work.

Variables and flags of RewriteCond, other flags of RewriteRule and other

Read the documentation.

We got acquainted with the principles of the work of RewriteRule, RewriteBase, flags [L], [R] and [QSA], and also disassembled the query processing mechanism inside mod_rewrite. From the unaffected remained: other flags of RewriteRule, directives RewriteCond and RewriteMap.

Fortunately, these directives and flags do not contain any riddles and work exactly as described in most textbooks. For their understanding, it is enough to read the official documentation. First of all I recommend to study the list of variables that can be checked in RewriteCond -% {QUERY_STING},% {THE_REQUEST},% {REMOTE_ADDR},% {HTTP_HOST},% {HTTP: header}, etc.)

The difference in the work of mod_rewrite in the context of .htaccess and in the context of VirtualHost

In the context Mod_rewrite works exactly the opposite.

As I said at the beginning of the article, everything described above concerns the use of mod_rewrite in the context of .htaccess. If mod_rewrite is used in , It will work in a different way:

  • AT In RewriteRule gets the entire query path, starting with the first slash, ending with the beginning of the GET parameters: "http://example.com/some/news/category/post.html?comments_page=3" -> "/ news / category / post. Html ". This line always begins with /.
  • The second argument to RewriteRule also needs to start with /, otherwise there will be a "Bad Request".
  • RewriteBase does not make sense.
  • Passage of rules occurs only once. The [L] flag really completes the processing of all the rules described in , Without any subsequent iterations.

Tips and solutions

Here are collected tips that could be cited in the course of the article, but which were excluded from the main text for brevity of the presentation of the material.

Creating Regular Expressions

Try to make regular expressions so that they narrowly define exactly the queries that you want to modify - so that the RewriteRule rules do not accidentally work for another query. For example:

# Начинайте все регулярные выражения с "^" (признак начала строки)
# и заканчивайте "$" (признак конца строки):
RewriteRule ^news.php$ index.php
# Даже если в этом нет необходимости - для универсальности и лучшего понимания конфигурации:
RewriteRule ^news/(.*)$ index.php

# Если под маску должны попадать только цифры - укажите это явно.
# Если какие-то цифры постоянны, укажите их явно.
# Если в оставшейся части запроса не могут присутствовать слеши, ограничьте их присутствие.
# Не забывайте экранировать "." (точки).
# Следующее правило нацелено на запросы вида http://example.com/news/2009/07/28/b-effect.html
RewriteRule ^news/20[0-9]{2}/[0-9]{2}/[0-9]{2}/[^/]+\.html index.php

However, you can read about regular expressions on our website:

Changing external redirects

In spite of the fact that mod_rewrite allows to change even external redirects with RewriteRule, up to the protocol, I strongly do not recommend doing it. In the article, an example with changing external redirects is used only to get rid of such concepts as "links" and "files" and more clearly show that RewriteRule works with a simple string.

I do not think that the developers of mod_rewrite assumed that someone would do so, so all sorts of artifacts are possible. Do not do it, please.

How to stop an infinite loop

Sometimes the redirection logic on the site is such that without special actions mod_rewrite treats them as an infinite redirect cycle. Let's take the following example.

The site had a page /info.html. An SEO specialist decided that search engines would better index this page if it was called /information.html and asked for an external redirect with info.html on information.html. However, the developer of the site for some reason can not just rename info.html in information.html and make a redirect - he needs the data to be delivered directly from the file info.html. He writes the following rule:

# сделать внешний редирект
RewriteRule ^info.html information.html [R,L]
# но по запросу /information.html все равно отдать info.html
RewriteRule ^information.html info.html

... and encounters an infinite cycle. Each request /information.html receives an external redirect again on /information.html.

Solve this problem in at least two ways. On Habré one of them has already been described - it is necessary to set the environment variable and, on the basis of its value, stop the redirects. The code will look like this:

RewriteCond %{ENV:REDIRECT_FINISH} !^$
RewriteRule ^ - [L]

RewriteRule ^info.html$ information.html [R,L]
RewriteRule ^information.html$ info.html [E=FINISH:1]

Note that mod_rewrite adds 'REDIRECT_' to the variable name.

The second way is to check in THE_REQUEST what exactly was requested by the user:

# Внешний редирект происходит только если пользователь запросил info.html.
# Если же info.html - это результат внутреннего перенаправления, правило срабатывать не будет.
RewriteCond %{THE_REQUEST} "^(GET|POST|HEAD) /info.html HTTP/[0-9.]+$"
RewriteRule ^info.html$ information.html [R,L]

RewriteRule ^information.html$ info.html

Analyzing the original user request - fighting the disclosure of Apache links

When processing the request, Apache opens the encoded (URL-encoded) characters from the original request. In some cases, this may be undesirable - the developer wants to check exactly the initial, unmodified user request. You can do this by checking the variable% {THE_REQUEST} in RewriteCond:

RewriteCond %{THE_REQUEST} ^GET[\ ]+/tag/([^/]+)/[\ ]+HTTP.*$
RewriteRule ^(.*)$ index.php?tag=%1 [L]

Recommended Documentation

Official Apache documentation

Technical details