This page has been robot translated, sorry for typos if any. Original content here.

All about CGI (Common Gateway Interface)

What is CGI?

From the beginning we will understand the terminology. CGI - Commom Gateway Interface is an interface that allows a web server to launch any programs on its request from the browser, as well as to give the result of their work to the browser. CGI program (script) is a program (script) running on the server also communicating with the browser through the above interface. Since there is no strict regulation about definitions of terms, it is very often, speaking CGI , because of the program (script), but not the interface itself.

If this is a program, then it must own any executable format acceptable for a particular operating system. Programs are allowed to scribble on anything: C / C ++, Pascal, Java, Visual is also just Basic, delphi, etc.

If this script (script), then on the operating system under which the web server is spinning, there must be a corresponding script interpreter: shell, perl, tcl / tk,, etc.

The main reason for the CGI program (script) development tool to meet the following needs: - allow to speak from the standard input stream (stdin) - to acquire the values ​​of environment variables - to enclose in the standard stdout flow

For what is used CGI:

  • Working with help systems is also a database.
  • Creating dynamic HTML documents is also a resource (including counters, guestbooks, etc.)
  • Remote administration of various systems.
  • Just work with different programs, because the HTML interface is quite easy to use, easy to manufacture is also nice looking :)

The working mechanism of CGI programs

As already mentioned, CGI acquires input data from the standard input stream stdin or from environment variables, but outputs the results of its work to the standard stdout output stream. For those. who does not know what it is: Standard input stream (stdin) - hence the program (script) by default acquires input information. Usually this is a keyboard, but it can be reassigned, and the program (script) will acquire input data from a file, a socket, the output stream of another program.

Environment variables are variables defined for the system also of the server on which the CGI will be executed. .

Standard output stream (stdout) - here the program (script) displays the results of its work. Usually this is a "monitor", but it is allowed to remap to a file, a socket, an input stream of another program, a printer, etc.

Most examples in this manual are written on the shell only in order to simplify the presentation of the material.

It is not recommended to use the shell to write CGI scripts, according to the final security guidelines .

1.1 CGI call without parameters

A very simple script outputting the current date: #!/bin/sh echo Content-type: text/html echo echo "

Today is "date echo"

"In the HTML act, a reference to it is described here in this way <a href = http: // depth = 1 & hl = en & amp; prev = hp & amp; rurl = & amp; sl = en & amp; sp = nmt4 & amp; tl = en & amp; u =,15700023,15700124,15700149,15700168,15700173,15700186,15700189,15700190,15700201,15700205&usg=ALkJrhgJWYFwuUoEY5ivVyIIWe5LTBCOlQ>

IMPORTANT NOTE The basic error that almost everyone who starts to write CGI programs or scripts commit is that they forget to insert a pointer to the type of the result to be printed - the header of the output document. This is also the third line in the example.

echo Content-type: text / html echo where Content-type: - the type of the output of the act (text / html, image / gif, image / jpeg, etc.).
An empty line in the conclusion expresses that the title has also terminated by the document itself.

1.2 Passing parameters to the CGI script or program

Parameters are transferred in pairs using the basic methods: GET is also POST . Each of them has its pluses plus disadvantages.

If you use GET, the parameters are added to the requested URL, and you can also call it like this:

http: // some_host / cgi-bin / some_script? options which allows you to act on such a script links in HTML documents. And on the server, the transferred parameters are assigned to the variable QUERY_STRING.

The text of the script itself: #!/bin/sh echo Content-type: text/html echo echo "

You sent this here:

"echo" "set | grep QUERY_STRING echo"

"echo" Environment
"set echo" "</ code> and how he was called out from this document: <a href = http: // depth = 1 & hl = en & amp; prev = hp & amp; rurl = & amp; sl = en & amp; sp = nmt4 & amp; tl = en & amp; u =,15700023,15700124,15700149,15700168,15700173,15700186,15700189,15700190,15700201,15700205&usg= ALkJrhhl5xZ0IQPw66oca_3nlf4FhIBKVw> Sample work (click here) </a>

However, the use of the GET method to transmit parameters containing confidential information is unacceptable. in this case all this information is transmitted openly.

The POST method allows you to ensure confidentiality when passing parameters to a script. But it passes parameters to the standard input stream, and you have to use forms to do this. The server does not send the EOF script at the end of the program. Instead, you need to use the CONTENT_LENGTH environment variable to determine what data capacity you need to compute from stdin.

Create a counter

Recently, the number of people wishing to attach a visit counter to their page is growing at a frantic pace. On the Internet, eat a lot of places where people can take any counters for any operating system and screw them to their pages.

This head of management will become more useful for those who are interested in the mechanism of the work of the counters, since all the examples attached are special "bells and whistles" according to the elements of settings, administration, etc. have no way. More "sophisticated", ready-to-use counters are looking for Altavista, Yahoo, and other search engines. Or ask in the relevant news conferences (relcom.www.users,; in fidoshnyh ehah ru.internet. *).

2.1 Types of counters

With respect to the work mechanism, counters can be conditionally divided into a pair of type:
  1. CGI scripts working like Server Side Include
  2. CGI scripts that do not use Server Side Include
Server Side Include (SSI) is an HTML comment type that tells the Web server that dynamically generated data needs to be substituted in the call room. The basic format of the SSI call in the body of the HTML document is as follows:

<!--#command tag="value"...-->

where #command is any of the many commands understood by the Web server. In this case, the most interesting is the #exec command, which allows executing programs to also substitute the results of their work. The documents analyzed by the Web server are called server-parsed documents.

2.2 Visiting counter working as SSI

And the work algorithm:

  1. The server acquires a request from the browser for the HTML document.
  2. The server scans for the SSI call.
  3. If such calls are detected, then the result is placed on their premises. In the case of the #exec command , the result of the work of the program specified in "value" .
  4. The generated HTML act was set off in the reverse path to the browser.

Required server settings (on Apache server sample):

  1. In the file srm.conf write (if there is not yet written): AddType text / html .shtml AddHandler server-parsed .shtml These directives tell the server that files with .shtml extension are server-parsed documents.
  2. In the file access.conf on the directory where server-parsed documents will be located, in Options add the Includes option.
  3. For files containing SSI calls, assign the extension .shtml (see clause 1)
We demonstrate the operation of the counter on a sample of the counter script, found on the Internet at It is written in Perl, so popular today.

In from here we will calculate: <! - # exec cgi = "/ cgi-bin / counter" ->
(click Reload until you get bored)

This counter is text, i.e. the script gives just the text, which is shown. Similarly, images are allowed to be enclosed. To do this, you need to replace the text figures with the tags img src = "picture_c_corresponding_cipher". An inquisitive reader will easily guess that the number of tags img src ... is equal to the number of digits in the value returned by the counter.

The counter of this counter in the body of the act is executed by the command: <!--#exec cgi="/cgi-bin/counter"-->

2.3 A counter that does not use SSI

More simple from the point of view of the user, but more complex in programming is the counter not using SSI in any way. The working mechanism of such a counter is as follows:

  • in the body of the HTML act it is indicated: &lt;img src = / cgi-bin / examples / counter.cgi&gt; those. the requested picture is not static, but is dynamically generated by a CGI script.
  • the server, having received a request for a picture, runs the script specified in the src of the img tag.
  • script, increases the value of the counter per piece, generates a picture with the value of the counter also gives it to the browser.

Since this type of counters is the most popular on the Internet, the algorithm of its work will be considered in more detail.

With the crypt ( counter.cgi ), which is called in the body of the HTML document by the tag img src = "... counter.cgi" written in the shell also owns the following source code (line numbers added only to simplify the explanation): 1: #!/bin/sh 2: now=`date -u` 3: echo "Content-type: image/gif;" 4: echo "Expires: $now" 5: echo 6: counter|showdigits 1: #!/bin/sh 2: now=`date -u` 3: echo "Content-type: image/gif;" 4: echo "Expires: $now" 5: echo 6: counter|showdigits

That this script (a line description) operates:
1 - The title of the script itself. It points to the command interpreter that will execute it.

2 - Define the now variable, which contains the start time of the script (the time the image was created). The ' -u ' key says that the creation date / time is displayed in GMT . Why this should be described below.

3 - We are starting to form the title of the server objection. Specify the type of returned data: image / gif

4 - Since this is a counter, it is necessary to ensure that the kartika with its indications is not cached in any way (and what kind of counter it later is :) . To do this, we indicate that the image received by the browser should immediately zaekspayritsya. Here we also use the now variable, defined in line number 2. The use of Expires in this form corresponds to the standard for HTTP protocol version 1.1. But when using Expires, the func- tioning of an act can start with funny glitches, if the hours on the customer lag behind the server clock for a few minutes. A dilemma begins - it is laid down by the standard in this way, but it turns out that it's not exactly what you need. What to do? In the previous version of the protocol (HTTP 1.0), Expires was allowed to set to 0, but RFC2068 says that clients also running on HTTP 1.1 should support the ancient variation of the use of Expires (Expires: 0). So the staff, dear Russians, decide for yourself.

5 - End of the objection header - return an empty string.

6 - Using two programs (counter also showdigits) generate the picture itself.

The counter programs also showdigits are written in C using the labor library with GIF files - libgd. Without it, the programs will not be compiled. The latest version of the library can always be obtained at .

Then these programs work:

  • counter - reads from the file counter.rc the number that represents the previous value of the counter, adds one to it, also scribbles back. If you do not specify the route to the files - pictures with numbers, also the mask of these files, then take the default, which is defined in the body of the program. After that, it computed the counter value and the route to the pictures are output to stdout, than the command line for showdigits is generated.
  • showdigits - this program, in fact, also forms a picture with the current meter reading. To do this, we use a set of ready-made images with figures (gif format, all images of the same size) also received on stdin from the counter data. On the route, the mask is also taken from the number of required pictures and one of them is collected by one hyph. After that he goes straight to ... stdout ! And then the server redirects this stream to the browser, too, it (the browser) illustrates it as a picture, since the objection header indicates that it is a hyph.
The essence here is: - The server sends a stream of data to the browser. - The browser is full of all the same, in what place also as the server took the data stream transmitted to it (whether it's static, or dynamically generated, a normal file or the result of the script's vital activity), is that the browser knows how to correctly interpret it. To do this, use the header, which in this sample was generated by the counter.cgi script, but in 3-5 lines (see above). And, in the case of static files, the server itself generates this header based on its own settings, but in the case of cgi, this should be done by the script itself.

Server side includes

Well, it's clear that static HTML documents are good, but dynamically created ones are even better. :) So, in this head, we'll talk about dynamically creating documents using Server Side Includes. By the way, we note that the possibility of using SSI is the ability of each particular server. Some servers do not support SSI in any way, but for those that have this capability, the formats can also differ in command sets. So, read the operating instructions for your Web server. All the examples in this chapter are for the Apache server.

3.1 What is SSI

As already mentioned in the previous chapter, Server Side Include (SSI) is a Web server directive that allows the server to place any data on the premises of the call. In an HTML act, the SSI call looks like a format comment:

<!--#command tag="value"...-->

where #command is any of the SSI directives understood by the Web server, but " value " is its parameters.

The supplied data can also be static, dynamically generated. Static data is already ready, written in the form of files, fragments of text or HTML. Such data is convenient to use in the case at which time in various HTML documents you can put repetitive fragments. Dynamically generated data results in the work of any CGI scripts or operating system commands on which a particular Web server is running. Using this type of data gives the Web-developer great opportunities. But, according to the debilitated Russian-bourgeois advertisement, "Do not forget about Orbit without sugar!". I mean, REMEMBER ON MEASURES TO COMPLY WITH SAFE ACCESS TO INFORMATION! Incorrect use of SSI can lead to the possibility of unauthorized access to information and, accordingly, to various grave consequences. .

3.2 Basic SSI directives

config manages various aspects of parsing a document. Attributes: errmsg error message returned to the client, if any error occurred while parsing the document. sizefmt sets the size of the file size (bytes, kilobytes, megabytes). timefmt sets the date / time format. echo prints the value of one of the following environment variables. Attributes: var The name of the print variable exec executes the specified command or the CGI script. Attributes: cgi is specified (% -coded) URL-relative route to the CGI script. If the route does not begin with (/), it is assumed that the route is specified relative to the current document.

The CGI script is passed to the value of the PATH_INFO and QUERY_STRING variables of the original client request.

The cmd server executes the specified string using the operating system shell. fsize prints the size of the specified file with regard to sizefmt . Attributes: file specifies the path to the file relative to the current directory containing the analyzed file. virtual is specified (% -coded) URL-relative route to the file. If the route does not start with (/), it is assumed that the route is specified relative to the current document. flastmod prints the date / time of the final change of the specified file taking into account the timefmt . Attributes are as good as the fsize command. include inserts the text of another act or file into the analyzed document. It is very useful for repeating fragments in different documents. Attributes: file specifies the path to the file only relative to the current directory containing the analyzed file. virtual is specified (% -coded) URL-relative route to the file. If the route does not start with (/), it is assumed that the route is specified relative to the current document. In Apache, the included files can exist nested. printenv prints a list of all existing variables as well as their values. There are no attributes. Example:
<!--#printenv --> set sets the value of the variable. Attributes: var specifies the name of the variable to be set. value indicates the value of the variable being set. Example:
<!--#set var="variable_1" value="some_value_of_variable_1" -->

3.3 SSI environment variables

DOCUMENT_NAME - file name Description in the body of the document: <!--#echo var="DOCUMENT_NAME" --> Result of usage: <! - # echo var = "DOCUMENT_NAME" ->

DOCUMENT_URI - virtual path to file Description in the body of the document: <!--#echo var="DOCUMENT_URI" --> Result of usage: <! - # echo var = "DOCUMENT_URI" ->

QUERY_STRING_UNESCAPED - decoding the query string, with all shell metacharacters preceded by "\" Description in the body of the document: <!--#echo var="QUERY_STRING_UNESCAPED" --> Result of use: (none)

DATE_LOCAL - current date also time (local) Description in the body of the document: <!--#echo var="DATE_LOCAL" --> Result of usage: <! - # echo var = "DATE_LOCAL" ->

DATE_GMT - current date also time (GMT) Description in the body of the document: <!--#echo var="DATE_GMT" --> Result of usage: <! - # echo var = "DATE_GMT" ->

LAST_MODIFIED - the date is also the time of the final change of the file Description in the body of the document: <!--#echo var="LAST_MODIFIED" --> Result of usage: <! - # echo var = "LAST_MODIFIED"

3.4 Configuring the server

To ensure that the serever knew in which room in the act to substitute the data, he must analyze this act. The documents analyzed by the server are called server-parsed documents.

First of all, you need to let the server know what documents it needs to analyze. For this purpose, the following parameters must be added to the coniguration file (for Apache older versions of the NCSA web servers also, this is the srm.conf file, but for new versions of Apache, for example 1.3.4 - httpd.conf ), you need to add the following parameters: Apache server:

AddType text / html .shtml &lt;br&gt; AddHandler server-parsed .shtml

NCSA Server:

AddType text/x-server-parsed-html .shtml Указанные параметры выражают о том, что все файлы с расширением .shtml являются server-parsed , также пред тем как "отдать" этот акт заказчику сервер должен их проанализировать.

Зачем указывать отдельное расширение для server-parsed документов?,- спросит пытливый читатель. We answer. Конечно, ни один человек не препятствует добавить в файл конфигурации строку

AddType text/x-server-parsed-html .html Однако это приведет к тому, что сервер станет исследовать все документы с расширением .html, даже ежели в них нет вызова SSI, загрузка системы увеличиться, но производительность сервера снизится.

Не следует забывать также о том, что безуспешно включать вызов SSI в CGI программы, поскольку их заключение сервером не анализируется.

Для получения более детальной информации по конфигурированию вашего сервера на предмет использования SSI читайте документайию на ваш сервер.


Приложение 1. Переменные окружения сервера

Н иже приведен перечень основных переменных окружения сервера с краткими описанием назначения.В данном случае сервер Apache 1.2.5 с модулем PHP/FI-2.0.1. Для других веб-серверов (MS IIS, Netscape, NCSA httpd, также т.д.) переменные могут отличаться.

REMOTE_HOST - имя хоста приконнектившегося к серверу. В случае работы через прокси - имя прокси.

REMOTE_ADDR - IP адрес хоста приконнектившегося к серверу. В случае работы чрез прокси - IP адрес прокси.

REMOTE_PORT - номер порта клиента.
Пример: REMOTE_PORT=3381

HTTP_USER_AGENT - имя/номер версии/и т.д. заказчика (браузера). Использование этой переменной иной раз приводит в неистовство отдельных пользователей Интернет. :) Но на самом занятии весьма полезная вещь. Например для автоопределения русских кодировок.
Пример:HTTP_USER_AGENT=Mozilla/4.07 [en] (X11; I; FreeBSD 2.2.6-RELEASE i386)

HTTP_ACCEPT - типы данных, помимо text/html, воспринимаемые клиентом (браузером)
Пример: HTTP_ACCEPT=image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*

HTTP_ACCEPT_CHARSET - какие чарсеты понимает заказчик (браузер).
Пример: HTTP_ACCEPT_CHARSET=iso-8859-1,*,utf-8

HTTP_ACCEPT_LANGUAGE - какие языки воспринимвает заказчик (браузер).

* * *

SERVER_NAME - имя сервера соответствующее записи IN A в DNS, или значение переменной ServerName (или похожей) в конфиге сервера.

HTTP_HOST - имя сервера либо виртуального хоста, к которому обращается клиент. Значение HTTP_HOST может существовать равным значению SERVER_NAME.

SERVER_SOFTWARE - какое ПО используется в качестве сервера.
Пример: SERVER_SOFTWARE=Apache/1.2.5 PHP/FI-2.0.1

DOCUMENT_ROOT - маршрут к "корню" веб-сервера от "корня" файловой системы копьютера, на котором он работает.
Пример: DOCUMENT_ROOT=/usr/local/www/html

HTTP_CONNECTION - тип соединения.
Пример: HTTP_CONNECTION=keep-alive

SERVER_PROTOCOL - протокол, используемый для обмена данными с конкретным клиентом.

REQUEST_URI - имя запрашиваемого ресурса/документа, включающее в себя путь от корня веб-сервера. При обращении к корню сервера либо каталогу этой переменной присваивается имя каталога либо "/" в случае корня сервера.
Пример: REQUEST_URI=/cgi-bin/tralala/script.cgi

DOCUMENT_URI - имя запрашиваемого ресурса/документа, включающее в себя путь от корня веб-сервера. Обычно инициализируется при вызове SSI. В отличие от REQUEST_URI эта переменная, в случае обращения к каталогу либо корню сервера получает значение содержащее также имя файла, являющегося Directory Index'ом этого каталога.
Пример: DOCUMENT_URI=/tralala/index.shtml

HTTP_REFERER - наполненный URL документа, по ссылке с которого вы попали на этот сервер. Данную переменную разрешено использовать при написании счетчиков.

GATEWAY_INTERFACE - название/версия интерфейса, чрез какой сервер работает со скриптом.

SCRIPT_FILENAME - имя скрипта, содержащее наполненный маршрут от "корня" файловой системы.

SCRIPT_NAME - имя скрипта, содержащее маршрут от "корня" веб-сервера.
Пример: SCRIPT_NAME=/cgi-bin/tralala/script.cgi

REQUEST_METHOD - метод используемый заказчиком для передачи данных серверу. Бывают GET, HEAD, POST, PUT.

QUERY_STRING - этой переменной значение присваивается при передаче данных серверу методом GET
Пример: QUERY_STRING=button=on

CONTENT_LENGTH - этой переменной присваивается значение, равное количеству байт, переданных браузером серверу при использовании метода POST.

REMOTE_USER - имя пользователя. Передается только если аутентифицируется доступ к CGI скрипту.

PATH_INFO - дополнительная информация о маршруту, которую передал клиент. То кушать скрипт может приобретать некоторые параметры, содержащие информауцию о некотором "маршруте" к некоторым данным (например к файлу конфигурации, необходимому для отделки запроса отименно этого клиента). Этот маршрут "виртуальный" - т.е от "корня веб-сервера". Остальные данные разрешено передавать как обычно - методом GET или POST.
Пример: PATH_INFO=/some/path

PATH_TRANSLATED - то бла бла , что также PATH_INFO, только маршрут физический - "от корня файловой системы"

REMOTE_IDENT - Если HTTP сервер поддерживает идентификацию согласно RFC 931, то этой переменной присваивается имя пользователя получаемое от сервера.

SERVER_ADMIN - e-mail правителя веб-сервера.

SERVER_PORT - порт, какой "слушает" веб-сервер.
Пример: SERVER_PORT=80

* * *

HTTP_X_FORWARDED_FOR - в случае труда чрез прокси - IP адрес клиента, работаеющего чрез прокси.

HTTP_VIA - имя, номер порта, разновидность ПО прокси-сервера.
Пример: HTTP_VIA=1.0 (Squid/2.1.PATCH1)

HTTP_CACHE_CONTROL - что-то связанное с возрастом акта в кэше прокси сервера :) Лгать никак не буду - никак не знаю :)
Пример: HTTP_CACHE_CONTROL=max-age=259200