Introduction to CGI Programming CGI Programming Nancy McCracken ECS 400, Software Technologies for the WWW 3-234 Center for Science and Technology Syracuse University 111 College Place Syracuse NY 13244-4100 February 26, 1996 Click here for body text CGI Programming CGI is the Common Gateway Interface and is the scheme to interface other programs and systems to the HTTP Web protocol, using the same data protocols as the HTTP clients and servers. In this section, we will cover passing information from the web page to the CGI script processing information on the server and returning formatted web pages back to the web client an example using Perl as the scripting language brief descriptions of other CGI capabilities References: HTML and CGI Unleashed, John December and Mark GInsburg, chapters 19, 20 and 21, Sams.net Publishing. CGI Programming on the World Wide Web, Gundavaram, O'Reilly & Associates. The CGI Book, William Weinman, New Riders Publishing. Web documents. The Flow of Data amongst the Client, Server and CGI Script The client sends a request, conforming to the URL standard and formatted with a MIME header, to the server. The server parses the request and decides what to do: for FTP and other services, the server makes an appropriate request of its operating system and responds. for HTTP service, it retrieves the file named by the URL and decides what to do based on file type. An html, mpeg, au, or any other file with recognizable file extensions is returned directly to the client with no further processing (except in the case of Server Side Includes - SSI) if the file is executable, the server executes it as a CGI program. The server processes the header to pass execution parameters as environment variables or as a STDIN stream to the CGI program. The Flow of Data amongst the Client, Server and CGI Script The CGI program parses the input from the server and MUST generate a response - even if there is no data to send back, the CGI program must send an error or empty message since the http connection is still open and must be closed by the server. The CGI program will send a header to the server: If the header is type "Location", the server will send the indicated file to the client. If the header is "Content-type", the server will send all the data back to the client. This should be a properly formatted html page. When the CGI program terminates, the server closes the connection. Example form for Hello, World! This example consists of a simple form with just a submit button to activate the CGI program. Note that no data is being sent from the form to the CGI program in this simple example. Example CGI program in Perl for Hello, World! The Perl program returns output which is properly formatted HTML. The server returns it to the browser, which displays it as a page. Returning the html output is pretty simple as the server and browser handle the encoding and decoding of the MIME formatted message. The complications arise from sending text from the form to the CGI program; there are several ways to do it and the CGI program must decode the message. Pass Data to a CGI Program through Environment Variables Two environment variables QUERY_STRING and PATH_INFO are used to pass data to the CGI program - there are several ways to do this. Using a normal HTML link to pass data: Everything after the first question mark in a URL is put into the QUERY_STRING variable: Click here to run the program. The QUERY_STRING variable is "passed-argument". Everything after an executable in the path name is put into the PATH_INFO variable: Start the program. The PATH_INFO variable is "direction=north/speed=slow". These techniques are required by some search engines, such as WAIS, to pass keywords for the search. Another way to pass data through environment variables Using the method "get" in a form to pass data, all the input that the user types is put in the QUERY_STRING variable:
If the user types "Winona" and "Ryder" as values, the QUERY_STRING environment variable would have the encoded value "First+Name=Winona&Last+Name=Ryder". Here is a Perl program that would print that string on the web client's window (without regular html tags): #!/usr/local/bin/perl print "Content-type: text/html\n\n"; print "You typed \"$ENV{QUERY_STRING}\" in the input boxes\n"; Method=Get is NOT RECOMMENDED as input too long can be lost! Other Information in environment variables The web server also makes available information about the user and the server, including such things as what type of browser made the request. A list of environment variables available to the CGI program: GATEWAY_INTERFACE REMOTE_HOST SERVER_NAME REMOTE_ADDR SERVER_SOFTWARE AUTH_TYPE SERVER_PROTOCOL REMOTE_USER SERVER_PORT REMOTE_IDENT REQUEST_METHOD CONTENT_TYPE PATH_INFO CONTENT_LENGTH PATH_TRANSLATED HTTP_FROM SCRIPT_NAME HTTP_ACCEPT DOCUMENT_ROOT HTTP_USER_AGENT (browser) QUERY_STRING HTTP_REFERER Passing data as Standard Input to the CGI program It is recommended to use a form with METHOD=POST to safely pass any amount of data through STDIN. The CONTENT_LENGTH environment variable is set to the number of characters being sent, and the CONTENT_TYPE variable is set to "application/x-www-form-urlencoded". The data is encoded by the server: The fields are separated by the unencoded &. Within each field, an unencoded = separates the fieldname input form and the data. Spaces within a field are translated to +. Certain other keyboard characters are encoded to %[hex equivalent] - for example, ! becomes %3D. Perl subprogram to read input from web forms - Part I This subroutine works with either the GET or POST method, obtaining the user input string from the form into a scalar variable "$in". It then splits this string into fields into the array "@in", where each element contains the encoded string for one field. Perl subprogram - Part II For each field string, the subroutine converts all the encoding symbols. It then creates an associative array "%in" with a keyword,value pair from each field of the web form. This subroutine can be used without change in any Perl CGI program, unless you wish to have checkboxes on the form which may return the same name with more than one value. CGI Program Output: the response to the web server All output written by the CGI program to STDOUT is taken by the server to process. The output should start with a header in one of three types: Location: the server sends another file to the client (and terminates connection). print "Location: http://www.some.box.com/the_other_file.html"; Status: The server will return a status message to the client (and terminates connection). print "Status: 305 Document moved\n"; Content-type: The server sends all remaining output to the client (after the mandatory blank line!), terminating only when the script does. print "Content-type: text/html\n\n"; Some CGI programming practical tips On the web server that you are doing CGI programming, put the HTML pages with forms in a directory somewhere under the server's "document root" and the CGI program somewhere under the server's "cgi bin". The CGI program must have permissions properly set to be executed by the server. Furthermore, if the CGI program reads or writes to other files, then the server must have permission to do so. You can first debug your Perl program by executing it directly in the cgi-bin directory and providing test input in a file prog.pl < input.data When a CGI program crashes, an error should show up in the server's error_log file. More practical tips Your Perl script is run with the current directory as the cgi-bin directory in which it resides, so in any file systems accessess that you program in Perl, the path names are evaluated accordingly. Suppose that you have the file system structure in the example below, then to open file1.txt or file2.txt from prog.pl, use: open (FILE1, "file1.txt"); and open (FILE2, "../../htdoc/njm/file2.txt");