Given by Nancy McCracken at Jackson State University Mississippi on Fall Semester 97. Foils prepared 2 Sept 1997
Outside Index
Summary of Material
CGI is the Common Gateway Interface and is the scheme to interface other programs and systems to the HTTP Web protocol, using the same data protocols as the HTTP clients and servers. |
In this section, we will cover
|
References:
|
Outside Index Summary of Material
Nancy McCracken |
Courseware: Software Technologies for the WWW |
3-234 Center for Science and Technology |
Syracuse University |
111 College Place |
Syracuse NY 13244-4100 |
September 2, 1997 |
Click here for body text |
CGI is the Common Gateway Interface and is the scheme to interface other programs and systems to the HTTP Web protocol, using the same data protocols as the HTTP clients and servers. |
In this section, we will cover
|
References:
|
The client sends a request, conforming to the URL standard and formatted with a MIME header, to the server. |
The server parses the request and decides what to do:
|
The CGI program parses the input from the server and MUST generate a response - even if there is no data to send back, the CGI program must send an error or empty message since the http connection is still open and must be closed by the server. The CGI program will send a header to the server:
|
When the CGI program terminates, the server closes the connection. |
Including a link to a cgi program on a web page will cause the server to activate the program. The server can be configured to recognize a URL as denoting a cgi program either through the file extension .cgi or because the URL is to an executable file in the part of the server's directory space for cgi programs. |
This example consists of a simple form with just a submit button to activate the CGI program. Note that no data is being sent from the form to the CGI program in this simple example. |
The Perl program returns output which is properly formatted HTML. The server returns it to the browser, which displays it as a page. |
Returning the html output is pretty simple as the server and browser handle the encoding and decoding of the MIME formatted message. The complications arise from sending text from the form to the CGI program; there are several ways to do it and the CGI program must decode the message. |
Two environment variables QUERY_STRING and PATH_INFO are used to pass data to the CGI program - there are several ways to do this. |
Using a normal HTML link to pass data:
|
Using the method "get" in a form to pass data, all the input that the user types is put in the QUERY_STRING variable: |
<form method=get action="http://www.some.box/name.pl"> |
Type your first name <input name="First Name"><br> |
Type your last name <input name="Last Name"><br> |
<input type=submit value="submit"> |
</form> |
If the user types "Winona" and "Ryder" as values, the QUERY_STRING environment variable would have the encoded value "First+Name=Winona&Last+Name=Ryder". |
Here is a Perl program that would print that string on the web client's window (without regular html tags): |
#!/usr/local/bin/perl |
print "Content-type: text/html\n\n"; |
print "You typed \"$ENV{QUERY_STRING}\" in the input boxes\n"; |
Method=Get is NOT RECOMMENDED for forms as long input can be lost! |
The web server also makes available information about the user and the server, including such things as what type of browser made the request. |
A list of environment variables available to the CGI program: |
GATEWAY_INTERFACE REMOTE_HOST |
SERVER_NAME REMOTE_ADDR |
SERVER_SOFTWARE AUTH_TYPE |
SERVER_PROTOCOL REMOTE_USER |
SERVER_PORT REMOTE_IDENT |
REQUEST_METHOD CONTENT_TYPE |
PATH_INFO CONTENT_LENGTH |
PATH_TRANSLATED HTTP_FROM |
SCRIPT_NAME HTTP_ACCEPT |
DOCUMENT_ROOT HTTP_USER_AGENT (browser) |
QUERY_STRING HTTP_REFERER |
It is recommended to use a form with METHOD=POST to safely pass any amount of data through STDIN.
|
This subroutine works with either the GET or POST method, obtaining the user input string from the form into a scalar variable "$in". It then splits this string into fields into the array "@in", where each element contains the encoded string for one field. |
For each field string, the subroutine converts all the encoding symbols. It then creates an associative array "%in" with a keyword,value pair from each field of the web form. |
This subroutine can be used in any Perl CGI program. |
Suppose that a form has two fields named address and phone: |
Starting with Perl 5.004, the standard Perl release contains a module of library functions desinged to make Perl CGI programming a lot simpler. |
The "use" statement can include the CGI.pm module. The parameter, such as ":standard" or ":all" controls which library functions are imported. |
It includes a function called param, which is similar to ReadParse, in returning the value of any field name. |
In addition, CGI.pm contains functions that take care of formatting HTML tags in the print statements. |
Sometimes, you want to CGI program to itself generate a form (as part of its HTML output). CGI.pm has a set of functions for generating form tags and tags for various form elements. The default is that the same CGI program is used to process the input of a form as to generate it: |
All output written by the CGI program to STDOUT is taken by the server to process. The output should start with a header in one of three types:
|
On the web server that you are doing CGI programming, put the HTML pages with forms in a directory somewhere under the server's "document root" and the CGI program somewhere under the server's "cgi bin". The CGI program must have permissions properly set to be executed by the server. Furthermore, if the CGI program reads or writes to other files, then the server must have permission to do so. |
You should first debug your Perl program by executing it directly on the command line. If using ReadParse, there is a version that will also take input from STDIN. The param function of CGI.pm also allows this type of debugging: |
prog.pl < input.data |
When a CGI program has an error of any kind, it almost always generates an error message that the error is in the header. In fact, the error could be anywhere. |
Your Perl script is run with the current directory as the cgi-bin directory in which it resides, so in any file systems accessess that you program in Perl, the path names are evaluated accordingly. Suppose that you have the file system structure in the example below, then to open file1.txt or file2.txt from prog.pl, use: |
open (FILE1, "file1.txt"); and open (FILE2, "../../htdoc/njm/file2.txt"); |
Many servers, including NCSA, use an Access Control File (ACF) to configure basic authorization for access to all web documents in a directory. |
The global ACF is named access.conf in the server's configuration directory. |
Any directory in the server's document space can have a local ACF named .htaccess. Here is the format of an NCSA ACF: |
An example of a .htaccess file |
The password file can be created with a program called htpasswd (which creates a file called .htpasswd) that is distributed with the server. The password file should not be kept in a publicly accessible directory. |
A web server can return a sequence of replies to the web browser by running a Non-Parsed Header (NPH) CGI script, usually of the form:
|
It also uses a special MIME-type,
|
which allows each reply to replace the previous one on the same browser page. |
The main part of the document is a container, which has boundary strings between the individual entities, each starting with 2 dashes. The final boundary string also ends with 2 dashes to terminate the entire container. |
Suppose we have a set of gif files that make up an animation sequence: m01.gif, m02.gif, . . . The Perl program reads the list of file names from a text file and sends them as parts in a multipart MIME stream. |
The GIF89a specification allows a GIF file to have multiple blocks of data, each with a different image. The images can be set to display one after another or as overlays that partially replace sections of the preceding image. They can be set to have time delays and also to loop. |
Multiple-Block GIFs can be created by a program called Construction Set, from Alchemy Mindworks:
|
The web server can send an HTTP response with a header of type Refresh. This causes the browser to request a new page automatically after some time. |
In Server Push, the HTTP connection is kept open between the browser and the server for the duration of the responses. In Client Pull, the connection is closed and reopened each time the browser sends another request for a refresh. Thus Server Push is best for small size files that are going to be sent at small intervals, like animations of small images, and Client Pull is best for larger or longer interval transmissions, such as stock ticker updates. |
Any application with multiple forms processed by one or more cgi scripts may use hidden fields to pass information from one form to the next. |
Typical example is a shopping basket application. The first page can collect user information. |
Note that hidden fields are not secret - the user can see them by selecting "View - Document Source". |
After the initial information page, a cgi script produces the first shopping catalog page, adding hidden fields for user information and initializing a hidden field for the "current shopping basket". |
Each time that the user submits a form from a shopping catalog page, the cgi script adds the selected items to the current shopping basket field. Finally, the order is itemized and the transaction complete. |
The Netscape Browser can store information in fields on the client side. The browser stores this information in a text file in the browswer directory and can thus provide "persistant" cookies: the information can be saved over many browser invocations, as well as transactions. |
Basically, the browser sets up a cookie whenever it receives a "Set-Cookie" header from a server, and it will pass back information in the "HTTP_COOKIE" environment variable whenever the user requests a document that fits the validity parameters of the Set-Cookie header. |
More general than hidden fields but currently only works on netscape browsers, work is being done on a general standard for HTTP state-info mechanism. |
Current cookie specification is at URL: |
http://home.netscape.com/newsref/std/cookie_spec.html |
Each cookie must have a key,value pair defining the cookie. In addition, there are optional fields defining the validity of cookie requests:
|
Send one or more cookies using the Set-Cookie field in a header: |
If any request for a document or script satisfies the validity requirements, then an environment variable HTTP_COOKIE is set to have all the cookie values. |
Cookies are key, value pairs and are encoded just like all other fields, EXCEPT that they are separated by ; (instead of &). |
in Perl: $ENV{'HTTP_COOKIE'} |
could have the value: full+name=Max+Planck;Occupation=physicist |
This can be decoded in a similar fashion as the standard ReadParse subroutine, except that you must split the fields on ";". |
SSI allows the server to insert data into documents that are requested by the browser. The server is usually configured so that these modifiable documents have the tag .shtml. (The server associates this file extension with the MIME type: text/x-server-parsed-html) |
SSI will slow down the performance of the server as it must be checking for .shtml tags and parsing those files. |
SSI also allows the execution of programs and so has some of the functionality of using CGI scripts. |
A .shtml file has regular html, plus it may have SSI entries of the form: |
<!--#command tag1=value1 tag2=value2 . . . --> |
<!--#echo var="DOCUMENT_NAME"--> |
The echo command is used with the var tag to print the values of any Environment Variables from the standard list known to web servers/browsers plus this set for SSI: |
DOCUMENT_NAME, DOCUMENT_URI, QUERY_STRING_UNESCAPED, DATE_LOCAL, DATE_GMT, LAST_MODIFIED |
<!--#include file="address.html"--> |
<!--#include virtual="/projects/mine/address.html"--> |
These commands include the contents of the indicated files. |
<!--#exec cgi="/cgi-bin/counter.pl"--> |
This command allows the execution of other programs. These programs don't necessarily have to be in the server's cgi-bin directory, but they may raise special security issues. |