The following steps list a simplified overview of what happens when a client requests a CGI process (in-depth details appear in the following sections). Figure 2.1 is a high-level graphical overview of this process.
magnus.conf
file. All of these servers await requests from the port (also specified in the magnus.conf
file). Only one of the threads gains ownership of any specific arriving request, as shown in Figure 2.2.
How CGI programs relate to HTTP clients and servers
The server creates the CGI process
When a server receives a request that must be handled by an external application (a CGI request) that server creates a copy of itself (in Unix terms, it forks a process). This second process is called the CGI process because it is the process in which the CGI program will run. The CGI process has all the same communication pathways that the server process has. The only purpose for the CGI process is to set up communications between the CGI program and the server. Be careful about creating CGI programs that can run infinitely because as long as the program runs, your server has one less thread for servicing requests.
Because it is a copy of the server, the CGI process has access to information about the CGI request. For example, the CGI process knows
Security concerns about CGI
CGI is a powerful tool for interacting with users, but it can also be a potential security problem. You should follow these guidelines when implementing CGI on your server:
/./
/../
//
.cgi
extension runs as a CGI program. This method is the most flexible, but you need to be careful about which directories you put CGI programs in. You shouldn't put CGI programs in directories into which users can upload information because users can then write CGI programs that return information such as your /etc/passwd
file or that delete files from your server machine.
/cgi-bin
, which maps to a directory in the server root called cgi-bin
. The cgi-bin
directory is where you store all of your CGI programs.
no way to service request for /some/stuff.cgi
[virtual path][extra path information]?[query string]
/usr/docs/
and a request comes in with extra path information like /test/test.html
, the server translates the path to /usr/docs/test/test.html
and stores that translated path in an environment variable.
/misc/search.cgi
, and the document root is in the physical directory /Netscape/docs
.
Accepting user input from URLs
There are three types of information the client can send to a CGI program:
%
xx, where x is a hexadecimal digit. For example, %21
is an exclamation point. See the Administrator's Guide for a list of escaped characters.
name1=value1&name2=value2
... &nameN=valueN
=
or &
characters in the data, they're encoded using URL encoding. This avoids ambiguity when your program translates the form data. To properly decode this data, your CGI program should first split it into name-value pairs (eliminating the ampersands), then split each pair into a name and a value, and then apply URL decoding to the name portion and to the value portion of the pair.
When a user submits a form, you can use the order of the form items to determine what order your CGI program receives the name-value pairs. However, you should not depend on this behavior. The various form elements have their own rules for determining what value is associated with the name they are given:
Input to an ISINDEX search dialog
When the data comes from a search dialog resulting from an ISINDEX tag, the escaped character decoding is done by the CGI process. Your CGI program receives this information fully translated as command-line arguments. This is handy if you want to avoid the hassle of performing this translation yourself.
ISMAP or imagemaps
In the case of clickable images, the data you receive from the client software is sent as a query string that takes the form xx,yy where xx and yy are the coordinates of where the user clicked the image. Coordinates are measured as the number of pixels from the upper left corner of the image the user clicked. This lets your CGI program respond differently depending on where the user clicked the image.
Data the server sends to the CGI program
The server sends data to the CGI program in three ways: environment variables, standard input, and command-line arguments.
Environment variables are the most common method used to pass data about a request to your CGI program. This data comes from the server software itself, from the network socket connecting the client to the server, and from the URL that was used to access the CGI program (for example, when using the GET method, the data is sent to the QUERY_STRING variable).
This section describes how to obtain the information stored in environment variables, standard input, and command-line arguments.
Accessing environment variables
There are a several ways CGI programs can access environment variables. The method you use depends on the programming or scripting language you use. Environment variables are identified by character strings and have character string values. This section lists some examples of the different methods programming languages use to access environment variables.
Using Java Applets
In Java, environment variables are accessed as "system properties" by use of the System.getProperties or System.getProperty method or other related methods.
rhost = System.getProperty("REMOTE_HOST");
Using C or C++
In C or C++, you can use the getenv library call to access the environment variables.
#include <stdlib.h>
...
char *rhost = getenv("REMOTE_HOST");
Using Perl
In Perl, environment variables are accessed through a simple array.
$rhost = $ENV{'REMOTE_HOST'};
Using the Bourne shell
In the Bourne shell (/bin/sh), environment variables are accessed just like normal shell variables.
RHOST=$REMOTE_HOST
Using the C shell
The C shell is similar to the Bourne shell, but it needs the keyword set
before any variable assignment.
set RHOST = $REMOTE_HOST
Environment variables and their formats
This section lists all of the environment variables and their formats. The Netscape servers pass only these environment variables to the CGI, in order to save storage space and improve security.
SERVER_SOFTWARE
This environment variable contains the name and version of the software that your program is running under.
Format
name/
version
Example
Netscape-FastTrack/2.0
Netscape-Enterprise/2.0
SERVER_NAME
This environment variable contains the domain name or IP address of the server machine.
Format
A fully-qualified domain name or IP address
Example
198.93.93.10
or www.netscape.com
SERVER_URL
This environment variable contains the URL that individuals should use to access this server. This variable is not supported by revision 1.1 of the CGI interface. It is only available using Netscape web servers.
Format
protocol://
hostname[:
port]
If the server is running on a protocol's default port, the :port section won't be present.
Example
http://www.netscape.com:8081
GATEWAY_INTERFACE
This environment variable contains the revision of the CGI specification supported by the server software.
Format
CGI/
n.
n
n.n is the numerical revision.
Example
CGI/1.1
SERVER_PROTOCOL
This environment variable contains the name and revision of the protocol being used by the client and server.
Format
name/
version
Example
HTTP/1.0
SERVER_PORT
This environment variable contains the number of the port to which this request was sent.
Format
A number between 1 and 65,535
Example
80
REQUEST_METHOD
This environment variable contains the name of the method (defined in the HTTP protocol) to be used when accessing URLs on the server. When a hyperlink is clicked, the GET method is used.
When a form is submitted, the method used is determined by the METHOD attribute to the FORM tag. (See page 33 for more information.)
CGI programs do not have to deal with the HEAD method directly and can treat it just like the GET method.
Format
method
Examples
GET, HEAD, POST
PATH_INFO
This environment variable contains the extra path information that the server derives from the URL that was used to access the CGI program.
Format
/
dir1/
dir2...
Example
/html/graphics/doc1.gif
PATH_TRANSLATED
This environment variable contains the actual fully-qualified file name that was translated from the URL. Netscapeweb servers distinguish between path names used in URLs, and file system path names. It is often useful to make your PATH_INFO a virtual path so that the server provides a physical path name in this variable. This way, you can avoid giving file system path names to remote client software.
Format
/
dir1/
dir2...
Example
/Netscape/docs/doc1.html
SCRIPT_NAME
This environment variable contains the name of the virtual path to your program. If your program needs to refer the remote client back to itself, or needs to construct anchors in HTML referring to itself, you can use this variable.
Format
/
dir1/
dir2/
progname
Examples
/orders/tickets.cgi, /cgi-bin/order-tickets
QUERY_STRING
This environment variable contains information from an HTML page to your script in these three instances:
play
or view
returned from the following:
I want to <A HREF=multimed.cgi?play>play some music!</A> I want to <A HREF=multimed.cgi?view>view a graphic!</A>From a form, you might get
button1=on&button2=off
, or from a document that contains the ISINDEX tag you might get two+words
.
machine.subdomain.domain
www.netscape.com
If no host name information is available, the script relies on the REMOTE_ADDR variable instead.
198.93.93.10
basic
jdoe
/
subtype
Content-Length: 64
on
or off
, depending on whether or not security is active on the server.
HTTP_
. All letters in the name are changed to upper case. All hyphens are changed to underscore characters. Examples of these HTTP headers are described in the following sections.
/
subtype[,
type/
subtype]...
image/gif,image/jpeg, */*
Mozilla/1.1N (Windows)
,
dd-
mon-
yy
hh:
mm:
ss
GMT
The Weekday specifies the full name of the day, such as Thursday or Friday. The dd specifies the number of the day of the month.The mon specifies the three-letter abbreviation of the month. The yy specifies the current year within the century. The hh:mm:ss gives the current time in 24-hour format.
Saturday, 12-Nov-94 14:05:51
GMT
=
characters (you must encode the =
sign if you want to use it as an argument).
=
character to determine whether to use the command line. This means the client applications must encode the =
sign in ISINDEX queries if the CGI program is to use the command-line arguments.
If the server finds any encoded =
characters, it decodes the query information by first splitting it at the plus signs in the URL. It then performs additional character decoding before placing the resulting characters as command-line arguments. For example, the information name+date
is split and sent as command-line arguments name date
to the CGI program.
Note
If the server cannot send the string due to internal limitations (such asexec()
or/bin/sh
command-line restrictions) the server sends no command-line information and provides the nondecoded query information in the environment variable QUERY_STRING.
nph-"
. The server makes the standard output a direct copy of the socket to the client. Once you activate this feature, your CGI program is responsible for any protocol-related response headers or messages, including the following:
HTTP/1.0 200 OK Date: DayOfWeek, DD-Mon-YY nn:nn:nn GMT Server: Netscape-FastTrack/2.0 MIME-version: 1.0Even though you can control the nonparsed header feature, in most cases you should avoid it because CGI programs must print a valid CGI header on the standard output in order for the server to accept the response and send it to the client. In the Netscape server, the standard output and the standard error file streams are directed to the same place: back into the server. This means that errors your program generates or system utilities your program calls can interfere with your header. Similarly, if your program is abnormally terminated (through a bug or some other disaster), the server will send a server error to the client and describe the error in the server's error log file. Because of this, you should print your header as early as possible in your program.
name: valueThe end of the header is signalled by a single blank line. After the blank line, the server stops parsing your program's header and sends the rest of your data untouched to the client. This means that your program can output any type of data it needs to, including HTML, GIFs, or JPEGs. Each name-value pair is an HTTP protocol header. You can output any header you want and the server sends it to the client. However, if the server detects odd header lines, the server logs a 500 error and doesn't return any data to the client. Some of the commonly used HTTP headers are described here. When you output any of these headers, the server doesn't alter their values or their output.
text/html, text/plain
, image/gif, image/jpeg, audio/basic
Saturday, 12-Nov-94 14:05:51 GMT
x-gzip
for GNU zip compression
x-compress
for standard Unix compression.
Note
You do not necessarily have to redirect to an HTTP URL; you can redirect to a Gopher, news, FTP, or any other valid URL.If the location is a virtual path, such as
/misc/file.html
then the server restarts the request using the virtual path, for example
http://mysrvr/misc/file.html
.
However, the client isn't informed of the new location, so any relative links in the document are resolved from the directory of your CGI program, not of the document that is actually being returned. This means images referenced in the document might not work because the client might be looking for them in the wrong directory.
200 OK
unless a Location header with a full URL is present. If the location is present, the default is 302 Found
.
The status line has the form nnn reason, where nnn is the three-digit code for the request, and reason is a short string describing the error. The following codes and reasons are currently recognized by Netscape Navigator.
Sample program output
The following CGI program output sends an HTML document back to the client:
Content-type: text/html
<title>My little document</title>
This is my own little document. Do you like it?
The following output instructs the client to retrieve a different URL. The small HTML fragment at the bottom allows any navigation software that doesn't support redirection to retrieve the given URL.
Location: http://www.sample.org/abc/afile
This document can be accessed at the following <a
href=http://www.sample.org/abc/afile>location</a>.
Configuring your server to use CGI programs
Before your server can use CGI programs, you must either specify that all files in designated directories are CGI programs or activate CGI as a MIME type. This procedure is covered in the Administrator's Guide.
Customizing server-parsed HTML
Normally, HTML documents are sent to the client exactly as they are stored on disk. However, sometimes you might find it useful to have the server parse these files and insert request-specific information or files into the document. You can do this through server-parsed HTML.
To customize server-parsed HTML:
exec
command. The exec
command lets an HTML file run an arbitrary program on the server (you might want to deactivate the exec
command for security or performance reasons).
.shtml
for server-parsed HTML files. Sometimes you might not want to use a different file name extension.
The server can also parse only files with the Unix file permissions set so the execute bit is on. This is often unreliable because some documents have the execute bit set even though they aren't really executable.
The server can also look at every HTML file on the server. This can be a large performance hit because the server must look at every single HTML file it sends back from the parsed directory. (If the directory isn't that large, this might not be that much of a problem.)
<!--#command attribute1 attribute2 ... -->The command must be in lower case. The format for each attribute is the typical name-value pair:
name="value"
The errmsg
attribute defines a message sent to the client when an error occurs while parsing the file. The error is also logged in the error log file.
The timefmt
attribute determines the format for dates that are delivered by the flastmod. It uses the same format at the strftime library call in C.
The sizefmt
attribute determines the format for reporting the file size. It can have the values:
bytes
to report file size as a whole number in the format 12,345,678.
abbrev
to report file size as a number of KB or MB. This is the default.
<!--#config TIMEFMT="%r %a %b %e, %Y" sizefmt="abbrev"-->This gives you a date format like 08:23:15 AM Wed Apr 15, 1996, and a file size format that reports the number of KB or MB of characters allocated to the file.
The virtual
attribute is a virtual path to the file.
The file
attribute is a relative path name from the current directory. You can't use elements such as ../
and you can't use absolute paths.
<!--#include FILE="bottle.gif"-->
var
to specify the variable to echo.
<!--#echo VAR="DATE_GMT"-->
sizefmt
attribute in the config command.
<!--#fsize FILE="bottle.gif"-->
timefmt
attribute in the config command.
<!--#flastmod FILE="bottle.gif"-->
cmd
attribute runs a command using /bin/sh
. You can include any special environment variables in the command.
cgi
attribute runs a CGI program and includes its output in the parsed file.
<!--#exec CGI="workit.pl"-->
/user/test.shtml
).
The QUERY_STRING_UNESCAPED environment variable contains the unescaped version of any search query the client sent with all shell-special characters escaped with the character.
The DATE_LOCAL environment variable contains the current date and local time.
The DATE_GMT environment variable is similar to DATE_LOCAL but is expressed in Greenwich mean time.
The LAST_MODIFIED environment variable contains the date the file was last modified.
*.html
.
Choose what format the last modification date should have. You can choose from the list of formats given, or you can specify your own using the strftime format. See your system's documentation about the strftime function for details on that format.
Finally, you can type your text trailer using HTML tags and entity encoding. You can use up to 254 characters. Any existing trailer appears in the box.
Note
Any entities you type in the trailer are decoded if you later edit the trailer. Be sure to re-encode the entities before submitting the form again! For example, if you use "&
" in your trailer, when you later edit the trailer you'll see simply "&
". You need to change "&
" to "&
" before submitting the form.