Introduction to CGI Programming CGI Programming with Advanced Topics Nancy McCracken Courseware: Software Technologies for the WWW 3-234 Center for Science and Technology Syracuse University 111 College Place Syracuse NY 13244-4100 February 17, 1997 Click here for body text CGI Programming CGI is the Common Gateway Interface and is the scheme to interface other programs and systems to the HTTP Web protocol, using the same data protocols as the HTTP clients and servers. In this section, we will cover passing information from the web page to the CGI script processing information on the server and returning formatted web pages back to the web client an example using Perl as the scripting language brief descriptions of other CGI capabilities References: HTML and CGI Unleashed, John December and Mark GInsburg, chapters 19, 20 and 21, Sams.net Publishing. CGI Programming on the World Wide Web, Gundavaram, O'Reilly & Associates. The CGI Book, William Weinman, New Riders Publishing. Web documents. The Flow of Data amongst the Client, Server and CGI Script The client sends a request, conforming to the URL standard and formatted with a MIME header, to the server. The server parses the request and decides what to do: for FTP and other services, the server makes an appropriate request of its operating system and responds. for HTTP service, it retrieves the file named by the URL and decides what to do based on file type. An html, mpeg, au, or any other file with recognizable file extensions is returned directly to the client with no further processing (except in the case of Server Side Includes - SSI) if the file is executable, the server executes it as a CGI program. The server processes the header to pass execution parameters as environment variables or as a STDIN stream to the CGI program. The Flow of Data amongst the Client, Server and CGI Script The CGI program parses the input from the server and MUST generate a response - even if there is no data to send back, the CGI program must send an error or empty message since the http connection is still open and must be closed by the server. The CGI program will send a header to the server: If the header is type "Location", the server will send the indicated file to the client. If the header is "Content-type", the server will send all the data back to the client. This should be a properly formatted html page. When the CGI program terminates, the server closes the connection. Example form for Hello, World! This example consists of a simple form with just a submit button to activate the CGI program. Note that no data is being sent from the form to the CGI program in this simple example. Example CGI program in Perl for Hello, World! The Perl program returns output which is properly formatted HTML. The server returns it to the browser, which displays it as a page. Returning the html output is pretty simple as the server and browser handle the encoding and decoding of the MIME formatted message. The complications arise from sending text from the form to the CGI program; there are several ways to do it and the CGI program must decode the message. Pass Data to a CGI Program through Environment Variables Two environment variables QUERY_STRING and PATH_INFO are used to pass data to the CGI program - there are several ways to do this. Using a normal HTML link to pass data: Everything after the first question mark in a URL is put into the QUERY_STRING variable: Click here to run the program. The QUERY_STRING variable is "passed-argument". Everything after an executable in the path name is put into the PATH_INFO variable: Start the program. The PATH_INFO variable is "direction=north/speed=slow". These techniques are required by some search engines, such as WAIS, to pass keywords for the search. Another way to pass data through environment variables Using the method "get" in a form to pass data, all the input that the user types is put in the QUERY_STRING variable:
Type your first name
Type your last name
If the user types "Winona" and "Ryder" as values, the QUERY_STRING environment variable would have the encoded value "First+Name=Winona&Last+Name=Ryder". Here is a Perl program that would print that string on the web client's window (without regular html tags): #!/usr/local/bin/perl print "Content-type: text/html\n\n"; print "You typed \"$ENV{QUERY_STRING}\" in the input boxes\n"; Method=Get is NOT RECOMMENDED as input too long can be lost! Other Information in environment variables The web server also makes available information about the user and the server, including such things as what type of browser made the request. A list of environment variables available to the CGI program: GATEWAY_INTERFACE REMOTE_HOST SERVER_NAME REMOTE_ADDR SERVER_SOFTWARE AUTH_TYPE SERVER_PROTOCOL REMOTE_USER SERVER_PORT REMOTE_IDENT REQUEST_METHOD CONTENT_TYPE PATH_INFO CONTENT_LENGTH PATH_TRANSLATED HTTP_FROM SCRIPT_NAME HTTP_ACCEPT DOCUMENT_ROOT HTTP_USER_AGENT (browser) QUERY_STRING HTTP_REFERER Passing data as Standard Input to the CGI program It is recommended to use a form with METHOD=POST to safely pass any amount of data through STDIN. The CONTENT_LENGTH environment variable is set to the number of characters being sent, and the CONTENT_TYPE variable is set to "application/x-www-form-urlencoded". The data is encoded by the server: The fields are separated by the unencoded &. Within each field, an unencoded = separates the fieldname input form and the data. Spaces within a field are translated to +. Certain other keyboard characters are encoded to %[hex equivalent] - for example, ! becomes %3D. Perl subprogram to read input from web forms - Part I This subroutine works with either the GET or POST method, obtaining the user input string from the form into a scalar variable "$in". It then splits this string into fields into the array "@in", where each element contains the encoded string for one field. Perl subprogram - Part II For each field string, the subroutine converts all the encoding symbols. It then creates an associative array "%in" with a keyword,value pair from each field of the web form. This subroutine can be used without change in any Perl CGI program. CGI Program Output: the response to the web server All output written by the CGI program to STDOUT is taken by the server to process. The output should start with a header in one of three types: Location: the server sends another file to the client (and terminates connection). print "Location: http://www.some.box.com/the_other_file.html"; Status: The server will return a status message to the client (and terminates connection). print "Status: 305 Document moved\n"; Content-type: The server sends all remaining output to the client (after the mandatory blank line!), terminating only when the script does. print "Content-type: text/html\n\n"; Some CGI programming practical tips On the web server that you are doing CGI programming, put the HTML pages with forms in a directory somewhere under the server's "document root" and the CGI program somewhere under the server's "cgi bin". The CGI program must have permissions properly set to be executed by the server. Furthermore, if the CGI program reads or writes to other files, then the server must have permission to do so. You can first debug your Perl program by executing it directly in the cgi-bin directory and providing test input in a file prog.pl < input.data When a CGI program crashes, an error should show up in the server's error_log file. More practical tips Your Perl script is run with the current directory as the cgi-bin directory in which it resides, so in any file systems accessess that you program in Perl, the path names are evaluated accordingly. Suppose that you have the file system structure in the example below, then to open file1.txt or file2.txt from prog.pl, use: open (FILE1, "file1.txt"); and open (FILE2, "../../htdoc/njm/file2.txt"); Password Protection on HTML Documents Many servers, including NCSA, use an Access Control File (ACF) to configure basic authorization for access to all web documents in a directory. The global ACF is named access.conf in the server's configuration directory. Any directory in the server's document space can have a local ACF named .htaccess. Here is the format of an NCSA ACF: Example protection files An example of a .htaccess file The password file can be created with a program called htpasswd (which creates a file called .htpasswd) that is distributed with the server. The password file should not be kept in a publicly accessible directory. Dynamic Web Pages --- Server Push A web server can return a sequence of replies to the web browser by running a Non-Parsed Header (NPH) CGI script, usually of the form: nph-myprogram.cgi It also uses a special MIME-type, multipart/x-mixed-replaced which allows each reply to replace the previous one on the same browser page. The main part of the document is a container, which has boundary strings between the individual entities, each starting with 2 dashes. The final boundary string also ends with 2 dashes to terminate the entire container. A Server-Push Animation Suppose we have a set of gif files that make up an animation sequence: m01.gif, m02.gif, . . . The Perl program reads the list of file names from a text file and sends them as parts in a multipart MIME stream. Server-Push Animation, cont. Multiple-Block GIF Files (Animated GIFs) The GIF89a specification allows a GIF file to have multiple blocks of data, each with a different image. The images can be set to display one after another or as overlays that partially replace sections of the preceding image. They can be set to have time delays and also to loop. Multiple-Block GIFs can be created by a program called Construction Set, from Alchemy Mindworks: http://www.north.net/alchemy/alchemy.html Dynamic Web Pages --- Client Pull The web server can send an HTTP response with a header of type Refresh. This causes the browser to request a new page automatically after some time. In Server Push, the HTTP connection is kept open between the browser and the server for the duration of the responses. In Client Pull, the connection is closed and reopened each time the browser sends another request for a refresh. Thus Server Push is best for small size files that are going to be sent at small intervals, like animations of small images, and Client Pull is best for larger or longer interval transmissions, such as stock ticker updates. Maintaining State with Hidden Fields Any application with multiple forms processed by one or more cgi scripts may use hidden fields to pass information from one form to the next. Typical example is a shopping basket application. The first page can collect user information. Note that hidden fields are not secret - the user can see them by selecting "View - Document Source". Hidden Fields in a "Shopping Basket" After the initial information page, a cgi script produces the first shopping catalog page, adding hidden fields for user information and initializing a hidden field for the "current shopping basket". Finishing the "Shopping Basket" Each time that the user submits a form from a shopping catalog page, the cgi script adds the selected items to the current shopping basket field. Finally, the order is itemized and the transaction complete. Maintaining State with Netscape Cookies The Netscape Browser can store information in fields on the client side. The browser stores this information in a text file in the browswer directory and can thus provide "persistant" cookies: the information can be saved over many browser invocations, as well as transactions. Basically, the browser sets up a cookie whenever it receives a "Set-Cookie" header from a server, and it will pass back information in the "HTTP_COOKIE" environment variable whenever the user requests a document that fits the validity parameters of the Set-Cookie header. More general than hidden fields but currently only works on netscape browsers, work is being done on a general standard for HTTP state-info mechanism. Current cookie specification is at URL: http://home.netscape.com/newsref/std/cookie_spec.html Setting up a cookie Each cookie must have a key,value pair defining the cookie. In addition, there are optional fields defining the validity of cookie requests: expires - gives the day and time in GMT up to which the browser should save the cookie. path - sets the subset of the document space (URL's) on the server for which the cookie is valid. domain - sets the domain for which the cookie is valid. Domains must have at least one ".", i.e. they can't be very general. secure - setting this attribute requires that the cookie only be sent via a secure channel, Netscape's secure server, SSL. Send one or more cookies using the Set-Cookie field in a header: Retrieving a cookie If any request for a document or script satisfies the validity requirements, then an environment variable HTTP_COOKIE is set to have all the cookie values. Cookies are key, value pairs and are encoded just like all other fields, EXCEPT that they are separated by ; (instead of &). in Perl: $ENV{'HTTP_COOKIE'} could have the value: full+name=Max+Planck;Occupation=physicist This can be decoded in a similar fashion as the standard ReadParse subroutine, except that you must split the fields on ";". Server Side Includes (SSI) SSI allows the server to insert data into documents that are requested by the browser. The server is usually configured so that these modifiable documents have the tag .shtml. (The server associates this file extension with the MIME type: text/x-server-parsed-html) SSI will slow down the performance of the server as it must be checking for .shtml tags and parsing those files. SSI also allows the execution of programs and so has some of the functionality of using CGI scripts. A .shtml file has regular html, plus it may have SSI entries of the form: Some SSI Commands The echo command is used with the var tag to print the values of any Environment Variables from the standard list known to web servers/browsers plus this set for SSI: DOCUMENT_NAME, DOCUMENT_URI, QUERY_STRING_UNESCAPED, DATE_LOCAL, DATE_GMT, LAST_MODIFIED These commands include the contents of the indicated files. This command allows the execution of other programs. These programs don't necessarily have to be in the server's cgi-bin directory, but they may raise special security issues.