Introductory Guide to Writing CGIs

INTRODUCTION

CGIs are one means of producing dynamic web pages. They can alter the content of your Web pages based on browser, time of day, user information or a host of other factors. For large-scale web application development, alternatives to CGIs such as ISAPI, Java Servlets and Zeus Distributed Authentication and Content Generation API exist and should be considered for performance reasons. However small, simple CGI programs can make your Web site a far more interesting place to be than purely static pages.

CGI is an acronym for Common Gateway Interface, it defines the method used by web servers to communicate with external programs. CGIs are generally easy and quick to produce and can be written to power even the most complicated of Web sites. It should be noted that the flexibility of CGI programs is also a potential weakness, not least the security concerns of badly written programs. This document is intended as an introduction to CGI programming, not a comprehensive tutorial. Users wishing to taking CGI programs further than the scope of this document should consult the wealth of online documentation or a good book on the subject.

CGI Programs can be written in almost any language, including but not limited to C/C++, FORTRAN, Pascal, PERL, SH and Python. CGI programs are often referred to as CGI scripts, although scripting languages are a popular way of developing CGI programs it is a misnomer. CGI programs can be, and often are traditional compiled programs. A particular advantage of scripting languages for CGI programming is portability. An interpreted language is not tied to one platform, and should easily transfer between different Operating Systems providing an interpreter exists. Whereas compiled languages will always have to be recompiled if you change your web server machine to another platform or OS.

We shall primarily use PERL (Practical Extraction and Reporting Language) for our examples, PERL although originally a UNIX tool, is now available on almost all computer platforms including Windows95 / NT, Mac and OS2. PERL is an interpreted language and so may run a little slower than compiled programs. It does however have a number of useful functions for dealing with Strings and Lists which are applicable to CGI programming.

BASIC CONCEPTS

The beauty of CGI programs lie in their simplicity. In order to interface with the Zeus Server all you need to do is write to the standard output stream from your chosen language. In PERL this would use the print function, in C you would use printf(). Any information generated in this way will be passed on by the Zeus server to the client.

CGIs can output any type of information, regular text, HTML, images, even audio. It is therefore important for a CGI to identify what type of information it is sending back to the browser so it is correctly displayed. The browser expects any data it receives to be prefixed with HTTP Headers, specifically in this context, a Content-Type header. The Content-Type header needs to be set to the correct MIME type of the data sent to the browser. Generally your CGIs will return text, or HTML which have MIME types of text/plain and text/html respectively.

So an example of the first line your CGI program needs to output will be

print"Content-Type: text/plain", "\n\n";
The "\n\n" (a user visible blank line) is required to terminate the HTTP Headers section before you send back the data.

MY FIRST CGI PROGRAM

We now know enough information to write a real CGI program. We shall start where all good tutorials start, with a hello world program.

Hello World in PERL :

#!/usr/bin/perl
print "Content-Type: text/plain", "\n\n";
print "Hello World in PERL", "\n";
The first line of our PERL program tells the web server where to find the PERL interpreter, contact your local system administrator if you don't know where this is. We then output the HTTP Headers, in this case we only supply the Content-Type field, followed by the blank line required to terminate the Headers section. Finally we output our data.

Hello World in C :

#include <stdio.h>
int main(void)
{
   printf("Content-Type: text/plain \n\n");
   printf("Hello World in C! \n");
}
These short, simple programs show how easy it is to write CGI programs, they are not however particularly dynamic! To produce dynamic web pages the CGI programs require information from the server on which to act. There are two types of information which the Zeus Server passes to the CGI program.

All this information, with the exception of some types of form data, is passed in environment variables for the CGI program to read.

ENVIRONMENT VARIABLES

Environment variables are very important in CGI programming, passing essential information to the CGI program to generate its output. Environment variables are accessed through the associative array %ENV, in PERL, consult your language documentation on how to do this in other languages.

The %ENV associative array is indexed by the variable name. For full details of all the variables passed in the CGI environment please consult the Zeus CGI Reference Manual. In the next example we use two such variables, SERVER_NAME and HTTP_USER_AGENT. The SERVER_NAME variable contains the web server's Internet host name. The HTTP_USER_AGENT variable contains the client (usually a browser) identification string, as sent by the client.

Environment Example.

#!/usr/bin/perl
print "Content-Type: text/html", "\n\n";
print "<HTML>","\n";
print "<HEAD><TITLE>Environment example</TITLE></HEAD>", "\n";
print "<BODY><H1>Some Environment Variables</H1>", "\n";
print "<HR>","\n";
print "The server host name is : ", $ENV{'SERVER_NAME'},"<p>";
print "and the client is : ", $ENV{'HTTP_USER_AGENT'},"<p>";
print "<HR>","\n";
print "</BODY></HTML>";
exit 0
Again the first line tells the server where to find the PERL interpreter. The second line is changed to tell the client that this time we are sending back HTML output not just plain text. The next four lines just print out the HTML necessary to display a title, a heading and a horizontal line, these could all have been included in a single print statement, but are split for clarity. The following two lines display the environment variables along with a brief description of what they are. In this example we only echo the values back to the user, a relatively pointless exercise, but we could have displayed different information for different client, (browser) strings. The last three lines finish the HTML and end the PERL script.

The output from this script looks something like this:

Some Environment Variables


The server host name is : digital.zeus.co.uk

and the client is : Mozilla/4.01 [en] (WinNT; I)


CGI AND FORMS

The last example might have included some dynamic elements, but it wasn't very useful. We'll now look at how we can ask users for information and then act on it to generate personalised Web pages.

HTML Forms are the standard method of requesting information from the user. They provide a simple means of displaying text boxes, check buttons and radio buttons within the browser. For full information on HTML forms consult the HTML specifications at the World Wide Web Consortium (http://www.w3.org).

A basic HTML Form

<HTML>
<HEAD>
<TITLE>A simple form</TITLE>
</HEAD>

<BODY>
<H1>Please enter your name!</H1>
<FORM ACTION="processform.cgi"METHOD=POST>

Please enter your name <INPUT TYPE="text" NAME="name"> <p> 
Please enter your email <INPUT TYPE="text" NAME="email"> <p>

<INPUT TYPE="submit" VALUE="Submit the form">
<INPUT TYPE="reset" VALUE="Clear all fields">

</FORM>
</BODY>
</HTML>

The important lines in the HTML are :
<FORM ACTION="processform.cgi" METHOD=POST>
Which defines the CGI program to run when the submit button is pressed. It also defines how the data is passed to the CGI program. There are two methods which data can be passed to CGI programs, GET and POST. POST is generally the more useful, and more widely used method so it's that which we'll use here. For a more complete list of differences between the two methods please consult the Zeus CGI Reference Document.

And also :

Please enter your name <INPUT TYPE="text" NAME="name"> <p> 
Please enter your email <INPUT TYPE="text" NAME="email"> <p>
Which define two text input boxes labelled name and email, these labels are used in the CGI program to read the associated values. CGI Form Code The PERL script to process the form information is a little more complicated than the previous examples. The POST method writes data from the form straight to the CGI program through standard input, it also escapes some characters by prefixing them with %'s and converts other characters to their ASCII values in hexadecimal. Full details of the POST method are available in the Zeus CGI Reference Documentation.

Process Form CGI

#!/usr/bin/perl
$form_data_size = $ENV{'CONTENT_LENGTH'};
read (STDIN, $form_data, $form_data_size);
@info_returned = split (/&/,$form_data);
Here we read the information entered from the form into an associated array, the total length of the data is passed via the environmental variable CONTENT_LENGTH. The array is then split on &'s which the POST method uses to separate each field in the HTML Form.
foreach $keyvalue (@info_returned)
{
   ($key, $value) = split(/=/, $keyvalue);
   $value =~ tr/+/ /;
   $value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack("C",hex($1))/eg;
   $pairs{$key}=$value
}
Then for each item in the array we split it on ='s into key, value pairs. All spaces which have been encoded into +'s are then translated back using a PERL regular expression. We then use another regular expression to convert the ASCII values back in the real characters. Finally the last line builds an associative array indexed on the HTML input labels.

$name=$pairs{'name'};
$email=$pairs{'email'};
$machine=$ENV{'REMOTE_HOST'};
We then set up three more variables. The first two which store the values out of the associative array we actually wanted. The last reads the remote machine name from an environment variable.
print "Content-Type:text/html", "\n\n";
print "<HTML>","\n";
print "<HEAD><TITLE>Form example output</TITLE></HEAD>", "\n";
print "<BODY><H1>Welcome</H1>","\n";
print "<HR>","\n";
print "Hi <em>",$name, "</em> of <em>",$email, "</em><p>";
print "from machine <em>",$machine, "</em><br>", "\n";
print "<p>";
print "<HR>","\n";
print "</BODY></HTML>";
exit 0
From here on it's plain sailing. We output the HTTP Content-Type, and the necessary HTML to build the page.