This page has been robot translated, sorry for typos if any. Original content here.

How mod_rewrite actually works. A guide for continuing

Как на самом деле работает mod_rewrite. Пособие для продолжающих

This article has grown out of the idea of ​​advanced training for our technical support staff working with mod_rewrite. Practice has shown that after studying a large number of textbooks in Russian support, the solution of sample problems is well given, but independent rule making is done by trial and a lot of errors. The problem is that for a good understanding of the work of mod_rewrite requires the study of the original English-language documentation, after which - either additional clarification, or hours of experiments with RewriteLog.

The article outlines the mechanism of mod_rewrite. Understanding the principles of its work allows you to clearly understand the effect of each directive and clearly understand what is happening at one time or another inside mod_rewrite when processing directives.

I assume that the reader is already familiar with what mod_rewrite is, and I will not describe its basics, which are easy to find on the Internet. It should also be noted that the article highlights the work of mod_rewrite when using its directives in the .htaccess file. Differences when working in context set out at the end of the article.

So, you studied mod_rewrite, made several RewriteRule and managed to face endless redirects, the case when the rule for some reason does not catch your request, as well as the unpredictable work of the rule group when the subsequent rule unexpectedly changes the request painstakingly prepared by the previous rules.

What does RewriteRule work with

The first RewriteRule is passed the path from the place where the .htaccess is located to the requested file. This line never starts with "/". The subsequent RewriteRule is passed the result of previous conversions.

In order to thoroughly understand how a RewriteRule works, you must first determine what it works with . Consider how Apache gets a string that is initially passed to a RewriteRule in .htaccess.

When you first start working with mod_rewrite, it is logical to assume that it works with links. However, this is not the case with mod_rewrite in .htaccess. In fact, the path to the requested file is not transferred to the RewriteRule.

Because of the internal Apache architecture, at the moment when .htaccess comes into play, mod_rewrite can only operate on the path to the file to be processed. This is due to the fact that prior to the transfer to the mod_rewrite request, other modules could already be changed (for example, mod_alias), and the final path to the file on the site may not coincide with the original link. If mod_rewrite worked with the original link, it would violate the action of the modules that modified the request before it.

Therefore, the absolute path to the file to be processed is transferred to mod_rewrite. Also mod_rewrite knows the path to .htaccess, which contains the RewriteRule rules. To make something like a link from the path to the file that the site developer plans to work with, mod_rewrite cuts a part from the absolute path to the .htaccess file.

So, it is this path, from which the path to .htaccess is cut off, is transmitted to the first RewriteRule. For example:

  • Request: http://example.com/templates/silver/images/logo.gif
  • DocumentRoot: /var/www/example.com
  • File path: /var/www/example.com/templates/silver/images/logo.gif
  • .htaccess is at: /var/www/example.com/templates/.htaccess
  • The first RewriteRule will be transferred: silver / images / logo.gif
  • Please note: "templates /" is also cut off.
Как на самом деле работает mod_rewrite. Пособие для продолжающих

The path to .htaccess is cut off with a slash. From this there is a consequence: the line that is initially passed to the RewriteRule for processing never starts with "/".

It is important to remember that it does not do RewriteRule . It does not process the name of the site, the arguments that are passed to the script, and the link does not process everything, if .htaccess is not located in the root of the site. RewriteCond is doing all this, which will be briefly touched upon later. So:

# работать не будет - правило начинается со /
RewriteRule ^/index.php$ /my-index.php

# работать не будет - название сайта не анализируется RewriteRule
RewriteRule ^example.com/.* http://www.example.com

# работать не будет - аргументы ссылки не попадают в RewriteRule
RewriteRule index.php\?newspage=([0-9]+) news.php?page=$1
# работать не будет - правило начинается со /
RewriteRule ^/index.php$ /my-index.php

# работать не будет - название сайта не анализируется RewriteRule
RewriteRule ^example.com/.* http://www.example.com

# работать не будет - аргументы ссылки не попадают в RewriteRule
RewriteRule index.php\?newspage=([0-9]+) news.php?page=$1
# Будет работать только если .htaccess находится там же, где находится папка templates,
# например, в корне сайта. То есть, если .htaccess находится в templates/.htaccess , правило
# работать НЕ БУДЕТ, потому что mod_rewrite отрежет путь до .htaccess и на вход RewriteRule
# строка попадет уже без "templates/"
RewriteRule ^templates/common/yandex-money.gif$ templates/shared/yad.gif
# Будет работать только если .htaccess находится там же, где находится папка templates,
# например, в корне сайта. То есть, если .htaccess находится в templates/.htaccess , правило
# работать НЕ БУДЕТ, потому что mod_rewrite отрежет путь до .htaccess и на вход RewriteRule
# строка попадет уже без "templates/"
RewriteRule ^templates/common/yandex-money.gif$ templates/shared/yad.gif

At the beginning of using mod_rewrite, I recommend working with it only in .htaccess in the root of the site. This will somewhat simplify the monitoring of its work.

What RewriteRule works with, we figured it out. Now let's see how it works .

How RewriteRule Works

RewriteRule simply converts the string to regular expressions, that's all. RewriteRule works with a string, not with a link or path to the file.

As we found out above, the path from .htaccess to the requested file falls into the input of the RewriteRule. The most convenient way now is to abstract from the paths and references and treat what the RewriteRule works with as a regular line. This line is passed from RewriteRule to RewriteRule, modifying if any of the RewriteRule worked.

In general, if we exclude the difficulty of using flags (some of which we will discuss below) and the difficulty of composing regular expressions (which we will hardly touch on in this article), RewriteRule works VERY simply.

  1. They took a string.
  2. Compared with a regular expression in the first argument.
  3. If there is a match, replace the entire line with the value of the second argument.
  4. Transferred the string to the next RewriteRule.

Here, in general, and all. To illustrate that the RewriteRule works with a string, consider the following fantastic example:

# Запрос: http://mysite.com/info.html
# В первый RewriteRule попадет "info.html"

# Преобразовываем запрос в произвольную строку.
RewriteRule ^info.html$ "I saw a turtle in the hole. And it was dancing rock-n-roll. And it was smiling. All in all, it was a very funny doll."

# "info.html" -> "I saw a turtle..."

# Заменяем эту строку на внешнюю ссылку.
RewriteRule turtle https://example.com/information/index.html

# "I saw a turtle..." -> "https://example.com/information/index.html"

# Заменяем имя сайта!
RewriteRule ^(.*)example.com(.*)$ $1example.org$2

# "https://example.com/information/index.html" -> "https://example.org/information/index.html"

# Заменяем протокол!
RewriteRule ^https :( .*)$ ftp:$1

# "https://example.org/information/index.html" -> "ftp://example.org/information/index.html"

# Заменяем конечную ссылку.
RewriteRule ^(.*)/index.html$ $1/main.php

# "ftp://example.org/information/index.html" -> "ftp://example.org/information/main.php"

As you can see, the RewriteRule doesn't care what to work with - it simply converts the string according to the arguments given to it. If you want, you can store any data arrays in the string, if you wish, perseverance and good knowledge of regular expressions, you can at least write tic-tac-toe on RewriteRule.

Here you need to make a note: even though RewriteRule works with a clean line, it is still oriented to work with links. Therefore, it will react in a special way to lines beginning with “https: //” or analogs (remember that we wanted to make an external redirect) and to the “?” (consider the following characters as arguments that will need to be substituted for the request). But now it doesn’t interest us - it’s important to understand that there’s no magic in the RewriteRule — it just takes the string and changes it as you told it. We will look at external redirects and arguments later in the article; there are also some things to talk about.

After all conversions are performed and the last RewriteRule is executed, the RewriteBase takes effect.

What is RewriteBase for?

If the query after transformation is relative and different from the original one, RewriteBase will add itself to it on the left. It is necessary to specify RewriteBase in .htaccess. Its meaning is the path from the site root to .htaccess. RewriteBase is executed only after all RewriteRule, and not between them.

We have already said above that in mod_rewrite, working in .htaccess, gets the absolute path to the requested file. To pass it to the RewriteRule, mod_rewrite cuts the path to .htaccess. Then the RewriteRule rules change the request one by one. And after the request is changed, Apache should restore the absolute path to the file, which it should eventually process. RewriteBase is actually a hack that helps restore the original path to the file.

RewriteBase is executed after all conversions. This means that it will not change the request between the RewriteRule, and will take effect only when all the RewriteRule have worked.

After all the transformations, the RewriteBase looks, the relative one turned out to be a path or an absolute one. In the context of Apache, we mean a relative or absolute path, counting from the root of the site:

  • images / logo.gif - relative.
  • /images/logo.gif - absolute (at the beginning of the slash).
  • http://example.com/images/logo.gif is the most absolute of all.

If the path is absolute, RewriteBase does nothing. And if relative - RewriteBase appends itself to the left. This works for both internal and external redirects:

# .htaccess находится в /images/
# RewriteBase указан /images/
RewriteBase /images/

# Запрос http://example.com/images/logo.gif
# На вход RewriteRule попадает "logo.gif"
RewriteRule ^logo.gif$ logo-orange.gif
# После RewriteRule: "logo.gif" -> "logo-orange.gif"
# После RewriteBase: "logo-orange.gif" -> "/images/logo-orange.gif"

# Запрос http://example.com/images/header.png
# На вход RewriteRule попадает "header.png"
RewriteRule ^header.png$ /templates/rebranding/header.png
# После RewriteRule: "header.png" -> "/templates/rebranding/header.png"
# После RewriteBase: ничего не меняется, так итоговый результат преобразований начинается со "/'.

# Запрос http://example.com/images/director.tiff
# На вход RewriteRule попадает "director.tiff"
# Используем внешний относительный редирект
RewriteRule ^director.tiff$ staff/manager/director.tiff [R=301]
# После RewriteRule: "director.tiff" -> "staff/manager/director.tiff"
# + mod_rewrite запомнил, что будет внешний редирект
# После RewriteBase: "staff/manager/director.tiff" -> "/images/staff/manager/director.tiff"
# mod_rewrite вспомнил про внешний редирект:
# "/images/staff/manager/director.tiff" -> http://example.com/images/staff/manager/director.tiff

Usually, after some familiarity with mod_rewrite, the following habit develops: 1) add “RewriteBase /” to each .htaccess, 2) start all redirections with a slash: “RewriteRule news.php /index.php?act=news”. This helps get rid of the artifacts of RewriteBase, but doing so is wrong. Now that we know what the RewriteBase is doing, we can formulate the following correct rules:

  1. RewriteBase must match the path from the site root to .htaccess.
  2. Starting redirects with "/" is necessary only when you need to specify the absolute path from the site root to the file.
Как на самом деле работает mod_rewrite. Пособие для продолжающих

What happens if you do not specify RewriteBase? By default, Apache makes it equal to the absolute path on the file system before .htaccess (for example, /var/www/example.com/templates/). The incorrectness of this assumption Apache manifests itself in external relative redirects:

# Запрос http://example.com/index.php
# DocumentRoot: /var/www/example.com/
# .htaccess находится в корне сайта, и в нем НЕ УКАЗАН RewriteBase.
# Поэтому по умолчанию RewriteBase равен абсолютному пути до .htaccess: /var/www/example.com/

# На входе RewriteRule - "index.php"
RewriteRule ^index.php main.php [R]
# На выходе: "index.php" -> "main.php"
# mod_rewrite запомнил, что нужен внешний редирект

# Закончились RewriteRule
# mod_rewrite все равно выполняет RewriteBase, так как у него есть значение по умолчанию.
# Получается: "main.php" -> "/var/www/example.com/main.php"

# Здесь mod_rewrite вспоминает, что был внешний редирект:
# "/var/www/example.com/main.php" -> http://example.com/var/www/example.com/main.php

# Получилось совсем не то, что имели в виду.

So, the request went through all the RewriteRules, after which a RewriteBase was added to it, if necessary. Should Apache now give the file, which shows the resulting path? Not. Now the resulting request will be processed again.

How mod_rewrite works Flag [L]

mod_rewrite starts processing the request again and again, until it stops changing. And the flag [L] cannot stop it.

When compiling more or less complex mod_rewrite configurations, it is important to understand that changing the query does not end at the last RewriteRule . After the last rule of RewriteRule worked and the RewriteBase was added, mod_rewrite looks at whether the request has changed or not. If the request is changed, its processing begins anew from the beginning of .htaccess.

Apache does this because it could be redirected to another directory during the request change process. It may have its own .htaccess, which was not involved in the previous processing of the request. In the same new .htaccess there may be rules that affect the processing of a request - both the mod_rewrite rules and the rules of other modules. To correctly handle this situation, Apache must restart the entire processing cycle.

- Wait, but there is a flag [L] that stops the processing of the request by mod_rewrite'om!

Not certainly in that way. The [L] flag stops the current iteration of processing the request. However, if the request was changed by those RewriteRule, which still managed to work out, Apache will start the request processing cycle again from the first RewriteRule.

# Запрос: http://example.com/a.html

RewriteBase /

RewriteRule ^a.html$ b.html [L]
RewriteRule ^b.html$ a.html [L]

The example above will lead to an endless loop of redirections and to the “Internal Server Error” as a result. In this example, the infinite loop is obvious, but in more complex configurations it may be necessary to delve into the rules to determine which queries loop around each other.

To avoid such situations, it is recommended to use the [L] flag only when necessary. Necessity can be of two types:

  1. When an external redirect is used - [L, R = 301] or [L, R = 302]. In the case of an external redirect, further processing of the request is undesirable (see below the [R] flag), and it would be better to stop
  2. When in .htaccess there is a looping that cannot be eliminated, and the processing of the request by mod_rewrite should be forcibly stopped. In this case, a special design is used - see the tips on this topic at the end of the article.

But the example below will not loop. Try to determine why, and as a result, the file will be given to Apache.

# Запрос: http://example.com/a.html
# Начало .htaccess

RewriteBase /
RewriteRule ^a.html$ b.html
RewriteRule ^b.html$ a.html

# Конец .htaccess

How mod_rewrite works Flag [R]

The [R] flag does not stop processing the request, immediately returning an external redirect. Instead, he remembers the need for an external redirect, and the request processing continues with the following RewriteRule. It is recommended to always use with the [L] flag.

The [R] flag tells Apache to execute external redirect, not internal. What is the difference between external redirect from internal? Internal redirect simply changes the path to the file that will be given to the user, while the user believes that he receives the file that he originally requested. When an external redirect Apache instead of the contents of the file returns to the user the status of the response 301 or 302 and reports the link on which the browser should refer to the file.

It would seem that when processing the [R] flag, Apache should immediately stop processing the RewriteRule and return the external redirect to the user. However, let us recall a fantastic example from the section “How RewriteRule Works”. In it, we first indicated the [R] flag, denoting the need for an external redirect, and then continued to change the link with the following RewriteRule.

This is exactly how Apache works when specifying an external redirect. He simply "marks" himself that after the fulfillment of all the rules it is necessary to return the status 302 (by default), but he continues the execution of all RewriteRule further down the list. We can continue to change the request as we need, the only thing that does not work out is to make the redirect back internal.

However, it is unlikely that you want to change it after any external redirect. Therefore, it is recommended when using the [R] flag to indicate it together with [L] :

# BlackJack переехал на красивое имя
RewriteRule ^bj/(.*) blackjack/$1 [R=301,L]

# Можно использовать просто внешнюю ссылку
RewriteRule ^bj/(.*) http://blackjack.example.com/$1 [L]

Instead of using the [R] flag, you can simply specify an external link. In this case, Apache itself will guess that it is necessary to make an external redirect. Here, as with the case with the explicit indication of the flag [R], it is recommended to use the flag [L].

  • If an external redirect leads to the same site, it is better to use the [R] flag without a full reference (in other words, use a relative external redirect). This will make the rule independent of the site.
  • If an external redirect leads to another site, otherwise, as specifying a full external link, this will not work.

How mod_rewrite works Specifying Query Parameters and the [QSA] Flag

Changing the request parameters in the RewriteRule does not change the row with which the next RewriteRule works. However, when parameters are changed, the variable% {QUERY_STRING}, with which RewriteCond can work, changes.

Terminology used: “parameters” - request parameters, “arguments” - arguments RewriteRule.

With the help of RewriteRule, you can change not only the path to the file to be processed, but also the parameters of the GET request that will be sent to it. This is often used to transfer CNC processing to a common script processor, for example:

RewriteBase /

# Запрос: http://example.com/news/2010/07/12/grand-opening.html
# На входе: "news/2010/07/12/grand-opening.html"
RewriteRule ^news/(.*)$ index.php?act=news&what=$1
# После RewriteRule: "news/2010/07/12/grand-opening.html" -> "index.php"
# %{QUERY_STRING}: "" -> "act=news&what=2010/07/12/grand-opening.html"

At the moment when the RewriteRule rule encounters a question mark in the second argument, it understands that there is a change in the parameters in the request. The result is the following:

  1. RewriteRule replaces the line it works with with part of the second argument before the question mark . Please note that the new request parameters do not fall into the line with which the subsequent RewriteRule rules will work.
  2. The part of the second argument after the question mark is in the variable% {QUERY_STRING}. If the [QSA] flag was specified, the query parameters will be added to the beginning of% {QUERY_STRING}. If the flag is not specified,% {QUERY_STRING} is completely replaced by the request parameters from the RewriteRule.

A couple more examples:

RewriteBase /

# Запрос: http://example.com/news/2010/?page=2
# На входе RewriteRule: "news/2010/"
RewriteRule ^news/(.*)$ index.php?act=news&what=$1
# После преобразования: "news/2010/" -> "index.php"
# Значение %{QUERY_STRING}: "page=2" -> "act=news&what=2010/"

Most likely, the rule above works incorrectly, since the page argument is lost. Fix this:

RewriteBase /

# Запрос: http://example.com/news/2010/?page=2
# На входе RewriteRule: "news/2010/"
RewriteRule ^news/(.*)$ index.php?act=news&what=$1 [QSA]
# После преобразования: "news/2010/" -> "index.php"
# Значение %{QUERY_STRING}: "page=2" -> "act=news&what=2010/&page=2"

We added only the [QSA] flag, and the rule began to work correctly.

It is important to understand that changing query parameters changes% {QUERY_STRING} , which can be used later in RewriteCond. This should be taken into account when drafting subsequent rules that check arguments.

- Of course, it changes, because the request goes to Apache for repeated processing!

No,% {QUERY_STRING} is changed immediately . I will not give the proof - about the parameters it’s already written more than it’s interesting to read :)

What to do to check in RewriteCond exactly those request parameters that the user passed, and not modified by RewriteRules? See tips at the end of the article.

RewriteCond and performance

First, the matching of the request with the RewriteRule is checked, and only then the additional conditions of the RewriteCond are checked.

A few words should be said about the order in which mod_rewrite executes directives. Since in .htaccess RewriteCond goes first, and then RewriteRule, it seems that mod_rewrite first checks all the conditions, and then proceeds to run the RewriteRule.

In fact, everything happens the other way around. First, mod_rewrite checks if the current value of the request matches the regular RewriteRule expression, and only then will check all the conditions listed in the RewriteCond.

So if you have a two-page regular expression in RewriteRule, and having thought about performance, you decided to limit the execution of this rule to additional RewriteCond, you know - it will not work. In this case, it is better to use the RewriteRule [C] or [S] flags to skip the more complex rule, if the simpler checks did not work.

Variables and flags RewriteCond, other flags RewriteRule, etc.

Read the documentation.

We got acquainted with the principles of operation of RewriteRule, RewriteBase, flags [L], [R] and [QSA], and also sorted out the request processing mechanism inside mod_rewrite. From the unaffected left: other flags RewriteRule, directives RewriteCond and RewriteMap.

Fortunately, these directives and flags do not contain any mysteries and work exactly as described in most textbooks. For their understanding, it is enough to read the official documentation. First of all, I recommend studying the list of variables that can be checked in RewriteCond -% {QUERY_STING},% {THE_REQUEST},% {REMOTE_ADDR},% {HTTP_HOST},% {HTTP: header}, etc.)

The difference in the operation of mod_rewrite in the context of .htaccess and in the context of VirtualHost

In the context mod_rewrite works exactly the opposite.

As I said at the beginning of the article, everything described above concerns the use of mod_rewrite in the context of .htaccess. If mod_rewrite is used in It will work differently:

  • AT the entire request path from the first slash to the beginning of the GET parameters goes into the RewriteRule: “http://example.com/some/news/category/post.html?comments_page=3” -> ”/ news / category / post. html ". This line always starts with /.
  • The second argument of RewriteRule must also be started with /, otherwise there will be a “Bad Request”.
  • RewriteBase does not make sense.
  • Passage of rules occurs only once. The [L] flag really ends processing all the rules described in , without any subsequent iterations.

Tips and solutions

Here are collected tips that could be cited in the course of the article, but which were excluded from the main text for the sake of brevity of presentation.

Regular expression compilation

Try to make regular expressions so that they most narrowly define exactly those requests that you want to modify - so that the RewriteRule rules do not accidentally work for another request. For example:

# Начинайте все регулярные выражения с "^" (признак начала строки)
# и заканчивайте "$" (признак конца строки):
RewriteRule ^news.php$ index.php
# Даже если в этом нет необходимости - для универсальности и лучшего понимания конфигурации:
RewriteRule ^news/(.*)$ index.php

# Если под маску должны попадать только цифры - укажите это явно.
# Если какие-то цифры постоянны, укажите их явно.
# Если в оставшейся части запроса не могут присутствовать слеши, ограничьте их присутствие.
# Не забывайте экранировать "." (точки).
# Следующее правило нацелено на запросы вида http://example.com/news/2009/07/28/b-effect.html
RewriteRule ^news/20[0-9]{2}/[0-9]{2}/[0-9]{2}/[^/]+\.html index.php

However, you can read about regular expressions on our website as well:

Changing external redirects

Despite the fact that mod_rewrite allows you to modify even external external redirects using the RewriteRule, up to the protocol, I highly recommend not to do this. The article uses the example of changing external redirects only to get rid of such concepts as “links” and “files” and more clearly show that RewriteRule works with a simple string.

I do not think that the developers of mod_rewrite assumed that someone would do that, so any artifacts are possible. Do not do this, please.

How to stop an endless loop

Sometimes the logic of redirections on a site is such that without special actions mod_rewrite perceives them as an infinite loop of redirections. Take the following example.

The site was /info.html. The SEO specialist decided that search engines would better index this page if it is called /information.html and asked to make an external redirect from info.html to information.html. However, the developer of the site, for whatever reasons, cannot simply rename info.html into information.html and redirect it - it needs that the data be necessarily sent directly from the info.html file. He writes the following rule:

# сделать внешний редирект
RewriteRule ^info.html information.html [R,L]
# но по запросу /information.html все равно отдать info.html
RewriteRule ^information.html info.html

... and faces an endless loop. Each request /information.html receives an external redirect again to /information.html.

This problem can be solved in at least two ways. On Habré, one of them was already described - you need to set an environment variable and, based on its value, stop redirections. The code will look like this:

RewriteCond %{ENV:REDIRECT_FINISH} !^$
RewriteRule ^ - [L]

RewriteRule ^info.html$ information.html [R,L]
RewriteRule ^information.html$ info.html [E=FINISH:1]

Notice that mod_rewrite adds 'REDIRECT_' to the variable name.

The second way is to check in THE_REQUEST what exactly was requested by the user:

# Внешний редирект происходит только если пользователь запросил info.html.
# Если же info.html - это результат внутреннего перенаправления, правило срабатывать не будет.
RewriteCond %{THE_REQUEST} "^(GET|POST|HEAD) /info.html HTTP/[0-9.]+$"
RewriteRule ^info.html$ information.html [R,L]

RewriteRule ^information.html$ info.html

Analysis of the original user request - the fight against the disclosure of links Apache

When processing a request, Apache reveals coded (URL-encoded) characters from the original request. In some cases, this may be undesirable - the developer wants to check the initial, unmodified user request. You can do this by checking the variable% {THE_REQUEST} in RewriteCond:

RewriteCond %{THE_REQUEST} ^GET[\ ]+/tag/([^/]+)/[\ ]+HTTP.*$
RewriteRule ^(.*)$ index.php?tag=%1 [L]

Recommended Documentation

Apache official documentation

Technical details