All about CGI (Common Gateway Interface)
What is CGI?
From the beginning we will understand the terminology. CGI - Commom Gateway Interface is an interface that allows a web server to launch any programs on its request from the browser, as well as to give the result of their work to the browser. CGI program (script) is a program (script) running on the server also communicating with the browser through the above interface. Since there is no strict regulation about definitions of terms, it is very often, speaking CGI , because of the program (script), but not the interface itself.
If this is a program, then it must own any executable format acceptable for a particular operating system. Programs are allowed to scribble on anything: C / C ++, Pascal, Java, Visual is also just Basic, delphi is also etc.
If this script (script), then on the operating system under which the web server is spinning, there must be a corresponding script interpreter: shell, perl, tcl / tk, command.com, etc.
The main reason for the CGI program (script) development tool to meet the following needs: - allow to speak from the standard input stream (stdin) - to acquire the values of environment variables - to enclose in the standard stdout flow
For what is used CGI:
- Working with help systems is also a database.
- Creating dynamic HTML documents is also a resource (including counters, guestbooks, etc.)
- Remote administration of various systems.
- Just work with different programs, because the HTML interface is quite easy to use, easy to manufacture is also nice looking
The working mechanism of CGI programsAs already mentioned, CGI acquires input data from the stdin standard input stream, or from environment variables, but outputs the results of its work to the stdout standard output stream. For those. who does not know what it is: Standard input stream (stdin) - hence the program (script) by default acquires input information. Usually this is a keyboard, but it can be reassigned, and the program (script) will acquire input data from a file, a socket, the output stream of another program.
Environment variables are variables defined for the system also of the server on which the CGI will be executed. .
Standard output stream (stdout) - here the program (script) displays the results of its work. Usually this is a "monitor", but it is allowed to remap to a file, a socket, an input stream of another program, a printer, etc.
Most examples in this manual are written on the shell only in order to simplify the presentation of the material.
It is not recommended to use the shell to write CGI scripts, according to the final security guidelines .
1.1 CGI call without parameters
A very simple script outputting the current date:
#!/bin/sh echo Content-type: text/html echo echo "
Today is "date echo""In the HTML act, a reference to it is described here in this way
IMPORTANT NOTE The basic error that almost everyone who starts to write CGI programs or scripts commit is that they forget to insert a pointer to the type of the result to be printed - the header of the output document. This is also the third line in the example.
The empty line in the conclusion expresses that the title has also terminated in the document itself.
1.2 Passing parameters to the CGI script or program
Parameters are transferred in pairs using the basic methods: GET is also POST . Each of them has its pluses plus disadvantages.
When GET is used, the parameters are added to the requested URL and you can also call it:
The text of the script itself:
#!/bin/sh echo Content-type: text/html echo echo "
You sent this here:"echo" "set | grep QUERY_STRING echo"
However, the use of the GET method to transmit parameters containing confidential information is unacceptable. in this case all this information is transmitted openly.
The POST method allows you to ensure confidentiality when passing parameters to a script. But it passes parameters to the standard input stream, and you have to use forms to do this. The server does not send the EOF script at the end of the program. Instead, you need to use the CONTENT_LENGTH environment variable to determine what data capacity you need to compute from stdin.
Create a counter
Recently, the number of people wishing to attach a visit counter to their page is growing at a frantic pace. On the Internet, eat a lot of places where people can take any counters for any operating system and screw them to their pages.
This head of management will become more useful for those who are interested in the mechanism of the work of the counters, since all the examples attached are special "bells and whistles" according to the elements of settings, administration, etc. have no way. More "sophisticated", ready-to-use counters are looking for Altavista, Yahoo, and other search engines. Or ask in the relevant news conferences (relcom.www.users, relcom.www.support; in fidoshnyh ehah ru.internet. *).
2.1 Types of countersWith respect to the work mechanism, counters can be conditionally divided into a pair of type:
- CGI scripts working like Server Side Include
- CGI scripts that do not use Server Side Include
where #command is any of the many commands understood by the Web server. In this case, the most interesting is the #exec command, which allows executing programs to also substitute the results of their work. The documents analyzed by the Web server are called server-parsed documents.
2.2 Visiting counter working as SSI
And the work algorithm:
- The server acquires a request from the browser for the HTML document.
- The server scans for the SSI call.
- If such calls are detected, then the result is placed on their premises. In the case of the #exec command , the result of the work of the program specified in "value" .
- The generated HTML act was set off in the reverse path to the browser.
Required server settings (on Apache server sample):
- In the file srm.conf write (if there is not yet written):
AddType text / html .shtml AddHandler server-parsed .shtmlThese directives tell the server that files with .shtml extension are server-parsed documents.
- In the file access.conf on the directory where the server-parsed documents will lie, add the Includes option in Options.
- For files containing SSI calls, assign the extension .shtml (see clause 1)
In from here we will calculate: <! - # exec cgi = "/ cgi-bin / counter" ->
(click Reload until you get bored)
This counter is text, i.e. the script gives just the text, which is shown. Similarly, images are allowed to be enclosed. To do this, you need to replace the text figures with the tags img src = "picture_c_corresponding_cipher". An inquisitive reader will easily guess that the number of tags img src ... is equal to the number of digits in the value returned by the counter.
The counter of this counter in the body of the act is executed by the command:
2.3 A counter that does not use SSI
More simple from the point of view of the user, but more complex in programming is the counter not using SSI in any way. The working mechanism of such a counter is as follows:
- in the body of the HTML act it is indicated:
<img src = / cgi-bin / examples / counter.cgi>those. the requested picture is not static, but is dynamically generated by a CGI script.
- the server, having received a request for a picture, runs the script specified in the src of the img tag.
- script, increases the value of the counter per piece, generates a picture with the value of the counter also gives it to the browser.
Since this type of counters is the most popular on the Internet, the algorithm of its work will be considered in more detail.
With the crypt ( counter.cgi ), which is called in the body of the HTML document by the tag img src = "... counter.cgi" written in the shell also owns the following source code (line numbers added only to simplify the explanation):
1: #!/bin/sh 2: now=`date -u` 3: echo "Content-type: image/gif;" 4: echo "Expires: $now" 5: echo 6: counter|showdigits
1: #!/bin/sh 2: now=`date -u` 3: echo "Content-type: image/gif;" 4: echo "Expires: $now" 5: echo 6: counter|showdigits
That this script (a line description) operates:
1 - The title of the script itself. It points to the command interpreter that will execute it.
2 - Define the now variable, which contains the start time of the script (the time the image was created). The ' -u ' key says that the creation date / time is displayed in GMT . Why this should be described below.
3 - We are starting to form the title of the server objection. Specify the type of returned data: image / gif
4 - Since this is a counter, it is necessary to ensure that the kartika with its indications is not cached in any way (and what kind of counter it later is . To do this, we indicate that the image received by the browser should immediately zaekspayritsya. Here we also use the now variable, defined in line number 2. The use of Expires in this form corresponds to the standard for HTTP protocol version 1.1. But when using Expires, the func- tioning of an act can start with funny glitches, if the hours on the customer lag behind the server clock for a few minutes. A dilemma begins - it is laid down by the standard in this way, but it turns out that it's not exactly what you need. What to do? In the previous version of the protocol (HTTP 1.0), Expires was allowed to set to 0, but RFC2068 says that clients also running on HTTP 1.1 should support the ancient variation of the use of Expires (Expires: 0). So the staff, dear Russians, decide for yourself.
5 - End of the objection header - return an empty string.
6 - Using two programs (counter also showdigits) generate the picture itself.
The counter programs also showdigits are written in C using the labor library with GIF files - libgd. Without it, the programs will not be compiled. The latest version of the library can always be obtained at http://www.boutell.com/gd/ .
Then these programs work:
- counter - reads from the file counter.rc the number that represents the previous value of the counter, adds one to it, also scribbles back. If you do not specify the route to the files - pictures with numbers, also the mask of these files, then take the default, which is defined in the body of the program. After that, it computed the counter value and the route to the pictures are output to stdout, than the command line for showdigits is generated.
- showdigits - this program, in fact, also forms a picture with the current meter reading. To do this, we use a set of ready-made images with figures (gif format, all images of the same size) also received on stdin from the counter data. On the route, the mask is also taken from the number of required pictures and one of them is collected by one hyph. After that he goes straight to ... stdout ! And then the server redirects this stream to the browser, too, it (the browser) illustrates it as a picture, since the objection header indicates that it is a hyph.
Server side includesWell, it's clear that static HTML documents are good, but dynamically created ones are even better. So, in this head, we'll talk about dynamically creating documents using Server Side Includes. By the way, we note that the possibility of using SSI is the ability of each particular server. Some servers do not support SSI in any way, but for those that have this capability, the formats can also differ in command sets. So, read the operating instructions for your Web server. All the examples in this chapter are for the Apache server.
3.1 What is SSIAs already mentioned in the previous chapter, Server Side Include (SSI) is a Web server directive that allows the server to place any data on the premises of the call. In an HTML act, the SSI call looks like a format comment:
where #command is any of the SSI directives understood by the Web server, but " value " is its parameters.The supplied data can also be static, dynamically generated. Static data is already ready, written in the form of files, fragments of text or HTML. Such data is convenient to use in the case at which time in various HTML documents you can put repetitive fragments. Dynamically generated data results in the work of any CGI scripts or operating system commands on which a particular Web server is running. Using this type of data gives the Web-developer great opportunities. But, according to the debilitated Russian-bourgeois advertisement, "Do not forget about Orbit without sugar!". I mean, REMEMBER ON MEASURES TO COMPLY WITH SAFE ACCESS TO INFORMATION! Incorrect use of SSI can lead to the possibility of unauthorized access to information and, accordingly, to various grave consequences. .
3.2 Basic SSI directivesconfig manages various aspects of parsing a document. Attributes: errmsg error message returned to the client, if any error occurred while parsing the document. sizefmt sets the size of the file size (bytes, kilobytes, megabytes). timefmt sets the date / time format. echo prints the value of one of the following environment variables. Attributes: var The name of the print variable exec executes the specified command or the CGI script. Attributes: cgi is specified (% -coded) URL-relative route to the CGI script. If the route does not begin with (/), it is assumed that the route is specified relative to the current document.
The CGI script is passed to the value of the PATH_INFO and QUERY_STRING variables of the original client request.The cmd server executes the specified string using the operating system shell. fsize prints the size of the specified file with regard to sizefmt . Attributes: file specifies the path to the file relative to the current directory containing the analyzed file. virtual is specified (% -coded) URL-relative route to the file. If the route does not start with (/), it is assumed that the route is specified relative to the current document. flastmod prints the date / time of the final change of the specified file taking into account the timefmt . Attributes are as good as the fsize command. include inserts the text of another act or file into the analyzed document. It is very useful for repeating fragments in different documents. Attributes: file specifies the path to the file only relative to the current directory containing the analyzed file. virtual is specified (% -coded) URL-relative route to the file. If the route does not start with (/), it is assumed that the route is specified relative to the current document. In Apache, the included files can exist nested. printenv prints a list of all existing variables as well as their values. There are no attributes. Example:
<!--#printenv -->set sets the value of the variable. Attributes: var specifies the name of the variable to be set. value indicates the value of the variable being set. Example:
<!--#set var="variable_1" value="some_value_of_variable_1" -->
3.3 SSI environment variablesDOCUMENT_NAME - file name Description in the body of the document:
<!--#echo var="DOCUMENT_NAME" -->Result of usage: <! - # echo var = "DOCUMENT_NAME" -> DOCUMENT_URI - virtual path to file Description in the body of the document:
<!--#echo var="DOCUMENT_URI" -->Result of usage: <! - # echo var = "DOCUMENT_URI" -> QUERY_STRING_UNESCAPED - decoding the query string, with all shell metacharacters preceded by "\" Description in the body of the document:
<!--#echo var="QUERY_STRING_UNESCAPED" -->Result of use: (none) DATE_LOCAL - the current date is also the time (local) Description in the body of the document:
<!--#echo var="DATE_LOCAL" -->Result of usage: <! - # echo var = "DATE_LOCAL" DATE_GMT - current date also time (GMT) Description in the body of the document:
<!--#echo var="DATE_GMT" -->Result of usage: <! - # echo var = "DATE_GMT" -> LAST_MODIFIED - the date is also the time of the final change of the file Description in the body of the document:
<!--#echo var="LAST_MODIFIED" -->Result of usage: <! - # echo var = "LAST_MODIFIED"
3.4 Configuring the server
To ensure that the serever knew in which room in the act to substitute the data, he must analyze this act. The documents analyzed by the server are called server-parsed documents.
First of all, you need to let the server know what documents it needs to analyze. For this purpose, the following parameters must be added to the coniguration file (for Apache older versions of the NCSA web servers also, this is the srm.conf file, but for new versions of Apache, for example 1.3.4 - httpd.conf ), you need to add the following parameters: Apache server:
Why specify a separate extension for server-parsed documents ?, the inquisitive reader will ask. We answer. Of course, no person interferes with adding to the configuration file a string
It should also be remembered that it is unsuccessful to include the SSI call in the CGI program, because their conclusion is not analyzed by the server.
For more information on configuring your server for SSI usage, read the documentation on your server.
Appendix 1. Variables of the server environment
The following is a list of the main server environment variables with a brief description of the purpose. In this case, the Apache 1.2.5 server with the PHP / FI-2.0.1 module. For other web servers (MS IIS, Netscape, NCSA httpd, etc.), variables may vary.
REMOTE_HOST - the name of the host that is connected to the server. In case of working through a proxy, the name of the proxy.
Example: REMOTE_HOST = lom.pvrr.ru
REMOTE_ADDR - IP address of the host connected to the server. In case of working through a proxy - the IP address of the proxy.
Example: REMOTE_ADDR = 22.214.171.124
REMOTE_PORT is the port number of the client.
Example: REMOTE_PORT = 3381
HTTP_USER_AGENT - name / version number / etc. customer (browser). The use of this variable sometimes leads to the frenzy of individual Internet users. But in the class itself is a very useful thing. For example, to auto-detect Russian encodings.
Example: HTTP_USER_AGENT = Mozilla / 4.07 [en] (X11; I; FreeBSD 2.2.6-RELEASE i386)
HTTP_ACCEPT - data types other than text / html, perceived by the client (browser)
Example: HTTP_ACCEPT = image / gif, image / x-xbitmap, image / jpeg, image / pjpeg, image / png, * / *
HTTP_ACCEPT_CHARSET - which charets are understood by the customer (browser).
Example: HTTP_ACCEPT_CHARSET = iso-8859-1, *, utf-8
HTTP_ACCEPT_LANGUAGE - what languages are perceived by the customer (browser).
Example: HTTP_ACCEPT_LANGUAGE = nl, nl-BE, en
* * *
SERVER_NAME is the server name corresponding to the IN A record in DNS, or the value of the ServerName (or similar) variable in the server config.
Example: SERVER_NAME = arche.pvrr.ru
HTTP_HOST is the name of the server or virtual host that the client accesses. The value HTTP_HOST can exist equal to the value of SERVER_NAME.
Example: HTTP_HOST = www.pvrr.ru
SERVER_SOFTWARE - what software is used as a server.
Example: SERVER_SOFTWARE = Apache / 1.2.5 PHP / FI-2.0.1
DOCUMENT_ROOT - route to the "root" of the web server from the "root" of the file system of the computer on which it is running.
Example: DOCUMENT_ROOT = / usr / local / www / html
HTTP_CONNECTION is the connection type.
Example: HTTP_CONNECTION = keep-alive
SERVER_PROTOCOL is the protocol used to communicate with a specific client.
Example: SERVER_PROTOCOL = HTTP / 1.0
REQUEST_URI is the name of the requested resource / document, which includes the path from the root of the web server. При обращении к корню сервера либо каталогу этой переменной присваивается имя каталога либо "/" в случае корня сервера.
DOCUMENT_URI - имя запрашиваемого ресурса/документа, включающее в себя путь от корня веб-сервера. Обычно инициализируется при вызове SSI. В отличие от REQUEST_URI эта переменная, в случае обращения к каталогу либо корню сервера получает значение содержащее также имя файла, являющегося Directory Index'ом этого каталога.
HTTP_REFERER - наполненный URL документа, по ссылке с которого вы попали на этот сервер. Данную переменную разрешено использовать при написании счетчиков.
GATEWAY_INTERFACE - название/версия интерфейса, чрез какой сервер работает со скриптом.
SCRIPT_FILENAME - имя скрипта, содержащее наполненный маршрут от "корня" файловой системы.
SCRIPT_NAME - имя скрипта, содержащее маршрут от "корня" веб-сервера.
REQUEST_METHOD - метод используемый заказчиком для передачи данных серверу. Бывают GET, HEAD, POST, PUT.
QUERY_STRING - этой переменной значение присваивается при передаче данных серверу методом GET
CONTENT_LENGTH - этой переменной присваивается значение, равное количеству байт, переданных браузером серверу при использовании метода POST.
REMOTE_USER - имя пользователя. Передается только если аутентифицируется доступ к CGI скрипту.
PATH_INFO - дополнительная информация о маршруту, которую передал клиент. То кушать скрипт может приобретать некоторые параметры, содержащие информауцию о некотором "маршруте" к некоторым данным (например к файлу конфигурации, необходимому для отделки запроса отименно этого клиента). Этот маршрут "виртуальный" - т.е от "корня веб-сервера". Остальные данные разрешено передавать как обычно - методом GET или POST.
PATH_TRANSLATED - то бла бла , что также PATH_INFO, только маршрут физический - "от корня файловой системы"
REMOTE_IDENT - Если HTTP сервер поддерживает идентификацию согласно RFC 931, то этой переменной присваивается имя пользователя получаемое от сервера.
SERVER_ADMIN - e-mail правителя веб-сервера.
SERVER_PORT - порт, какой "слушает" веб-сервер.
* * *
HTTP_X_FORWARDED_FOR - в случае труда чрез прокси - IP адрес клиента, работаеющего чрез прокси.
HTTP_VIA - имя, номер порта, разновидность ПО прокси-сервера.
Пример: HTTP_VIA=1.0 proxy1.pvrr.ru:8080 (Squid/2.1.PATCH1)
HTTP_CACHE_CONTROL - что-то связанное с возрастом акта в кэше прокси сервера Лгать никак не буду - никак не знаю