This simple discussion of PERL describes the essential features needed for general-purpose programming
|
It does not describe the special concerns needed for systems programming but is aimed at what you need for writing CGI programs
|
We reference in detail the Llama Book: Learning PERL (2nd ed) by Randal L. Schwartz and Tom Christiansen published by O'Reilly and Associates. ISBN: 1-56592-284-0
|
More detailed is the Camel book: Programming PERL (2nd ed) by Larry Wall, Tom Christiansen, and Randal L. Schwartz, also by O'Reilly. ISBN: 1-56592-149-6
-
This is one of few authoritative Perl5 discussions
|
Another useful book which lies between the Llama and Camel books in completeness is: PERL by Example by Ellie Quigley, Prentice Hall. ISBN 0-13-122839-0
|
This simple discussion of PERL describes the essential features needed for general-purpose programming
|
It does not describe the special concerns needed for systems programming but is aimed at what you need for writing CGI programs
|
We reference in detail the Llama Book: Learning PERL (2nd ed) by Randal L. Schwartz and Tom Christiansen published by O'Reilly and Associates. ISBN: 1-56592-284-0
|
More detailed is the Camel book: Programming PERL (2nd ed) by Larry Wall, Tom Christiansen, and Randal L. Schwartz, also by O'Reilly. ISBN: 1-56592-149-6
-
This is one of few authoritative Perl5 discussions
|
Another useful book which lies between the Llama and Camel books in completeness is: PERL by Example by Ellie Quigley, Prentice Hall. ISBN 0-13-122839-0
|
PERL may be run interactively from the command line. For example,
|
will report the version of PERL in use.
|
The PERL command
-
% perl -e 'print "@INC\n";'
|
prints the search path for included files.
|
The command
-
% perl -MLWP -e 'print "libwww-perl-$LWP::VERSION\n";'
|
prints the installed version of LWP.
|
The complex PERL command
-
% perl -pi.bak -e 's/str1/str2/gi' `find . -name \*.html -print`
|
performs a case insensitive, global search-and-replace on all files ending with "html" in the current directory and all its subdirectories.
|
Assignments are $Next_Course = "CPS615";
|
$Funding = $Funding + $Contract;
|
For numeric quantities, the latter can be written as
|
$Funding += $Contract;
|
Similarly we can write for strings:
|
$Name= "Geoffrey"; $Name .= " Fox" # Sets $Name to "Geoffrey Fox"
|
Example: $A = 6; $B = ($A +=2); # sets $A = $B = 8
|
Auto-increment and auto-decrememt as in C
-
$a = $a + 1; $a +=1; and ++$a; # are the same and increment $a by 1
|
++ and -- are both allowed and can be used BEFORE (prefix) or AFTER (suffix) variable (operand). Both forms change operand in same way but in suffix form result used in expression is value BEFORE variable incremented.
-
$a=3; $b = (++$a) # sets $a and $b to 4
-
$a=3; $b = ($a++) # sets $a to 4 and $b to 3
|
We can use scalar variables in strings
-
$h= "World"; $hw= "Hello $h"; # sets $hw to "Hello World"
-
$h= "World"; $hw= "\UHello $h"; # sets $hw to "HELLO WORLD"
-
showing how \U and similarly \L operate on interpolated variables
|
Remember, there is NO interpolation for single-quoted strings!
|
There is also no recursion as illustrated below:
|
$fred= "You over there"; $x= '$fred'; $y= "Hey $x"; # sets $y as 'Hey $fred' with no interpolation
|
Use \$ to ensure no interpolation where you need literal $ character
-
$fred= "You over there"; $y= "Hey \$fred"; # sets $y as "Hey $fred" with no interpolation whereas:
-
$fred= "You over there"; $y= "Hey $fred"; # sets $y as "Hey You over there" with interpolation
|
Use ${var} to remove ambiguity as in
-
$y= "Hey ${fred}followed by more characters";
|
Logical and: && as in $x && $y;
|
If $x is true, evaluate $y and return $y
|
If $x is false, don't eval $y and return false
|
Logical or: || as in $x || $y;
|
If $x is true, don't evaluate $y and return true
|
If $x is false, evaluate $y and return $y
|
Logical Not: ! as in ! $x;
|
Return not $x
|
and is same as &&, or the same as || and not is same as ! but lower (indeed lowest) precedence (see Table 2-3 in Llama Book)
|
More complicated lists may be constructed:
|
($a, @fred) = @fred; # sets $a to first element of @fred and then removes this first element from @fred
|
($a,$b,$c) = (1,2,3); # sets $a=1, $b=2, $c=3
|
Curiously, setting a scalar equal to an array returns the length of the array
|
$a = @fred; # returns $a as length of @fred
|
The function length returns the number of characters in a string
|
$a = length (@fred); # returns length in characters of first entry in @fred
|
($a) = @fred; # sets $a to be first entry of @fred
|
Note that @fred (an array) and $fred (a scalar) are totally different variables
|
The elements of @fred are indexed starting at 0 (not 1, as in Fortran)
|
Elements are referenced by $ NOT @
|
$a = $fred[0]; # is first element in @fred
|
PERL interpolates arrays as well as scalars:
|
$fred[0]= "First element of \@fred";
|
Indices may be arbitrary integer expressions:
|
@fred = (0..10); $a = 2;
|
$b = $fred[$a-1]; # sets $b equal to 1
|
When variables are undefined or set to undefined as in
-
$a = $b ; # $b not defined
|
They are given special value undef which typically behaves the same as null (character string) or zero (numeric) value
-
<STDIN> returns undef when EOF is reached
-
@fred = (0,1,2,3); $a = $fred[6]; # sets $a equal to undef
-
@fred = (0,1,2,3); $fred[6] = 7; $a = $fred[5]; # leaves $a, $fred[4], $fred[5], and $fred[6] undefined
|
$index = $#fred; # sets $index to index value of last entry in @fred
-
$a = @fred; $b = $#fred; # the expression $b == $a - 1 is true
|
Useful functions defined() and exists() are available
|
push adds elements at end of a list (array). For example,
|
push(@stack, $new); # equivalent to @stack = (@stack, $new);
|
One can also use a list as the second arg in push:
|
push(@stack, 6, "next", @anotherlist);
|
pop, the inverse operator to push, removes and returns the last element in the list
-
Note chop(@stack) removes last character of each entry of @stack
|
unshift is similar to push, except it works on left (lowest indices) of list. For example,
|
unshift(@INC, $dir);
|
modifies the path for included files on-the-fly
|
shift is to pop as unshift is to push
|
reverse(@list) and sort(@list) leave @list unaltered, but returns reversed and sorted lists, respectively
|
foreach is similar to statement of same name in C-shell
|
foreach $element (@some_list) {
|
# loop body executes for each value of $element
|
}
|
$element is local to foreach and subsequently returned to the value it had before the loop executed
|
An example that prints the numbers 1 to 10 is
|
@back = (10,9,8,7,6,5,4,3,2,1);
|
foreach $num ( reverse(@back) ) {
|
print $num, "\n";
|
}
|
One can write more cryptically (a pathological addiction of UNIX programmers):
|
foreach (sort(@back)) { # sort(@back) == reverse(@back)
|
print $_, "\n"; # if loop variable is omitted
-
# PERL uses $_ by default
|
}
|
A hash (sometimes called an associative array) is a "software implemented" associative memory where values are fetched by names or attributes called keys
|
A hash is a set of pairs (key, value)
|
The entire array is referred to as %dict (for example):
-
$dict{key} = value; # NOTE curly braces {} to denote hash
|
The values can be used in ordinary arithmetic such as
-
$math{pi} = 3.14; $math{pi} += .0016; # sets $math{pi} = 3.1416;
-
either pi or 'pi' is allowed for specifying key
|
If key pimisspelt is undefined then $math{pimisspelt} returns undef and so one can easily see if a particular key has a value
|
Alternatively, the function exists($math{pimisspelt}) returns false unless key pimisspelt has a value
|
The order of storage of pairs in a hash is arbitrary and nonreproducible
-
one cannot push or pop an associative array
|
@listmime = %mime; # produces a list of form (key1,value1,key2,value2 ...)
-
This list can be manipulated like any list
-
One can also create a hash by defining such a list where adjacent elements are paired so that in above example
|
%newmime = @listmime; # creates a hash identical to %mime
|
One can delete specific pairs by delete command so for example:
-
%fred = (key1, "one", key2, "two"); # Quotes on key1 optional
-
delete $fred{key1}; # leaves %fred with one pair (key2,"two")
|
keys(%dict) returns a list (array) of keys in %dict (in arbitrary order). This can be used with foreach construct:
|
foreach (keys(%mime)) { # $_ is default loop variable
-
print "\$mime{$_} = $mime{$_}\n";
|
}
|
values(%dict) is typically less useful. It returns an unordered list of values (with repetition) in hash %dict
|
each(%dict) returns a single, two-element list containing the "next" key-value pair in %dict. For example,
|
while ( ($key,$val) = each(%dict) ) { ... }
|
Every call to each(%dict) returns a new pair until it finally returns null. The next call to each() starts the cycle again...
|
Sequence is c1c2c3.. -- a sequence of single characters
|
c* means "zero or more" instances of character c
|
c+ means "one or more" instances of character c
|
c? means "zero or one" instances of character c
|
All matching is greedy -- the maximum number of chars are "eaten up" starting with leftmost matching character
-
In Perl5, use ? to override greedy matching of regex parser
-
.*?: matches to first : in line while .*: matches to last : in line.
|
Curly brace notation:
|
c{n1,n2} means from n1 to n2 instances of character c
|
c{n1,} means n1 or more instances of character c
|
c{n1} means exactly n1 instances of character c
|
c{0,n2} means n2 or less instances of character c
|
Parentheses can be used as "memory" for relating different parts of a match or for substitution
|
If subexpressions are enclosed in parentheses, the matched values are put in temporary variables \1, \2, etc.
-
s/Geoffrey(.*)Fox/Geoffrey \(\1\) Fox/
-
when matched to 'Geoffrey Charles Fox' stores \1 = ' Charles ', which is transferred to substitution string giving result 'Geoffrey ( Charles ) Fox'
-
Note: Use \1, \2, etc. inside pattern only; use $1, $2, etc. outside pattern
|
Parentheses are also used to clarify the meaning of a regular expression. For instance,
-
/(a|b)*/ is different than /a|(b*)/
|
In regular expressions, variables are interpolated as in double-quoted strings. Use \$ to represent a literal dollar sign except at end of string where it represents end-of-string anchor.
|
An integer with optional plus or minus sign
|
^(\+|-)?[0-9]+$ (may use [-+]? as well)
|
Double-quoted strings, with no nested double quotes
|
"[^"]*" (matches the empty string "")
|
Double-quoted strings, with Fortran-like nested quotes
|
"([^"]|"")*"
|
Double-quoted strings, with C-like nested quotes
|
"([^"]|\\")*" (bugged! Why?)
|
"(\\"|[^"])*" (better, but still flawed)
|
"(\\"|[^"\\])*" (works, but inefficient)
|
"([^"\\]|\\")*" (works efficiently!)
|
Double-quoted strings, with other escaped characters
|
"([^"\\]|\\.)*" (but not escaped newlines)
|
"([^"\\]|\\(.|\n))*"
|
The result of the expression
|
$string =~ m/$regex/
|
is true if and only if the value of $string matches $regex.
|
For example,
|
if ( <STDIN> =~ m/^[Tt][Oo]:/ ) { ... }
|
matches if current input line starts with to: (any case)
|
Note: m/^to:/i is equivalent to above expression since modifier /i instructs pattern matcher to ignore case
|
Any delimiter may be used in place of the slash
|
m%^[Tt][Oo]:% # equivalent to previous expression
|
The m operator may be omitted, but then slash delimiters are required
|
The substitution operator s has the form:
|
$line =~ s/regex1/regex2/ ;
|
As with m, the operator s can use any delimiter and so
|
$line =~ s#regex1#regex2# ;
|
is an equivalent form
|
In the substitution s/regex1/regex2/g the /g causes substitution to occur at all possible places in string (normally only the first match is found)
|
Note that /i and /g can be used together
|
In an HTML doc, replace 2x2 with <NOBR>2 x 2 </NOBR>
|
Search: (\d+)x(\d+)
|
Replace: <NOBR>\1 x \2</NOBR>
|
PERL: s|(\d+)x(\d+)|<NOBR>\1 x \2</NOBR>|i
|
In code, replace an array subscript [...] with [n++]
|
Search: \[[^]]+\]
|
Replace: [n++]
|
PERL: s/\[[^]]+\]/[n++]/g
|
In an HTML doc, replace certain file references in URLs
|
Search: ys97_(\d\d)/
|
Replace: ys97_\1/index.html
|
PERL: s#ys97_(\d\d)/#ys97_\1/index.html#
|
Again in an HTML doc, replace certain paths in URLs
|
Search: ([^/])\.\./graphics
|
Replace: \1../../latex-graphics
|
PERL: s%([^/])\.\./graphics%\1../../latex-graphics%
|
$loc = index($string, $substr); # returns in $loc the location (first character in $string is location 0) of first occurrence of $substr in $string.
-
If $substr is not found, index returns -1
|
$loc = index($string, $substr, $firstloc); # returns $loc which is at least as large as $firstloc
-
Use to find multiple occurrences, setting $firstloc as 1+ previously found location
|
rindex($string, $substr, $lastloc) is identical to index except scanning starts at right (end) of string and not at start. All locations still count from left but if you give a third argument $lastloc, the returned $loc will be at most $lastloc
|
We have already seen equality operators
-
== ,!= for numerically equal, unequal
-
eq , ne for stringwise equal, not equal
|
$a <=> $b returns -1,0,1 depending if $a is respectively numerically less than, equal to, or greater than $b
|
$a cmp $b returns -1,0,1 depending if $a is respectively stringwise less than, equal to, or greater than $b
|
sort() is a builtin PERL function with three modes:
|
@result = sort @array; # equivalent to sort { $a cmp $b} @array;
|
which sorts the variables in @array using stringwise comparisons, returning them in @result
|
@result = sort BLOCK @array; # where statement BLOCK enclosed in {} curly brackets returns -1, 0, 1 given values of $a, $b
|
@result = sort { $age{$a} <=> $age{$b} } @array; # sorts by age if entries in @arrays are keys to hash %age, which holds numeric age for each key
|
@result = sort SUBNAME @array; # uses subroutine (which can be specified as value of scalar variable) to perform sorting
|
sub backsort { $b <=> $a; } # Reverse order for integers
|
@result = sort backsort @array; # sorts in numerically decreasing order
|
tr/ab/XY/ translates a to X and b to Y in string $_
|
As for m and s, one can apply tr to a general string with =~
|
$string =~ tr/a-z/A-Z/; # translates letters from lower to upper case in $string
|
Note use of - to specify range as in regular expressions, although tr does NOT use regular expressions
|
tr can count and return number of characters matched
|
$numatoz = tr/a-z//; # $numatoz holds number of lower case letters in $_
|
if final string empty no substitutions are made
|
if second string shorter than first, the last character in second string is repeated
|
tr/a-z/A?/; # replaces a by A and all other lower case letters by ?
|
if the /d option used, unspecified translated characters are deleted
|
tr/a-z//d; # deletes all lower case letters
|
the /c option complements characters in initial string
|
tr/a-zA-Z/_/c; # translates ALL nonletters into _
|
the /s option captures multiple consecutive copies of any letter in final string and replaces them by a single copy
|
There are ways of performing simple tests that require fewer curly braces and other punctuation
|
expr1 if testexp; # is equivalent to
|
if (testexp) {
|
}
|
last, redo, and next can be followed by such tests e.g.
|
last DOREALWORK if userendofinitializationhit ;
|
There are similar abbreviations for unless,while,until
|
dothisexpression unless conditionholds;
|
dostandardstuff while normalconditionholds;
|
dostandardstuff until specialconditionseen;
|
thatcommand if thiscondition; # is equivalent to
|
thiscondition && thatcommand;
|
PERL will not continue with && (logical and) if it finds a false condition. So if thiscondition is false, thatcommand is not executed
|
Similarly:
|
thatcommand unless thiscondition; # is equivalent to
|
thiscondition || thatcommand;
|
Note can use and instead of && and or instead of ||
-
not (instead of !) and xor (instead of ^) also allowed
|
We can use a C-like if-expression
|
expression ? Truecalc : Falsecalc; # which is equivalent to
|
if (expression) { Truecalc; } else { falsecalc; }
|
Files are like statement labels designated by a string without a special initial character. It is recommended that you use all capitals in such labels
|
STDIN STDOUT STDERR (and diamond <> null name) have been introduced and correspond to UNIX stdin, stdout and stderr (and concatenation of argument files if <> operator)
|
Filehandles allow you to address general files and the syntax is similar to UNIX standard I/O (stdio.h) support
-
open(FILEHANDLE, "unixname"); # opens file unixname for reading -- can use <
-
open(FILEHANDLE, ">unixname"); # opens file unixname for writing
-
open(FILEHANDLE, ">>unixname"); # opens file unixname in append mode
|
close(FILEHANDLE); # closes file
|
Errors are handled with die construct:
|
open(FH, '>' . $criticalfile) || die("Print an error message if file can't be opened\n"); # Note how we add '>' (or '>>') to file name stored in Perl variable
|
As illustrated <FILEHANDLE> reads either single line or full file depending on whether one stores it in a scalar or a array variable
|
print FILEHANDLE list; # writes list onto FILEHANDLE
|
print list; # is equivalent to
|
print STDOUT list;
|
There are a whole set of test operators which act on file NAMES not file HANDLES
|
-e $filename returns true if $filename EXISTS
|
-r $filename returns true if $filename is READABLE
|
-w $filename returns true if $filename is WRITABLE
|
-x $filename returns true if $filename is EXECUTABLE
|
chdir($name); transfers to directory specified in $name
|
mkdir($name, mode); # makes directory with given name $name and MODE (typically 3 octal characters such as 0755)
|
opendir(DIRHANDLE, $name); # opens directory with directory handle DIRHANDLE. Such names can be assigned independently of all other names and are in particular not connected with FILEHANDLEs
|
closedir(DIRHANDLE); # closes directory associated with handle DIRHANDLE
|
readdir(DIRHANDLE); # returns file names (including . and ..) in directory with handle DIRHANDLE
-
If scalar result, readdir returns "next" file name
-
If array result, readdir returns all file names in directory
|
system("shellscript"); # dispatchs shellscript to be executed by /bin/sh and anything allowed by shell is allowed in argument
-
system returns code returned by shellscript
|
system("date > tempfil"); # executes UNIX command date returning standard output from date to file tempfil in current directory
|
system("rm *") && die ("not allowed\n"); # terminates if error in system call as shell programs return nonzero if failure (opposite of open and most PERL commands)
|
Variable interpolation is done in double-quoted arguments:
|
$prog = "nobel.c"; system("cc -o $prog"); # (I) is equivalent here to
|
$ccompiler="cc";
|
system($ccompiler,"-o","nobel.c"); # (II) but in general not identical as in first form (I) shell interprets command list but in second form (II) the arguments are handed directly to command given in first entry in list given to system
|
%ENV is set as the shell environment in which the Perl program was invoked
|
Any UNIX processes invoked by system, fork, backquotes, or open inherits an environment specified by %ENV at invocation of child process
|
One can change %ENV in the same way as any hash:
|
%ENVIN = %ENV ; $oldpath = $ENV{"PATH"}; # saves input environment
|
$ENV{"PATH"} = $oldpath . ":/web/cgi"; # resets PATH to include an extra directory to be used by child process -- later we run
|
%ENV=%ENVIN; # Restores original environment
|
One can see what has been passed in %ENV by using Perl keys function
|
foreach $key (sort keys %ENV ) {
|
print "$key = $ENV{$key}\n"; # both $key, $ENV{} are interpolated
|
}
|
This is most powerful method with fork creating two identical copies of program -- parent and child
|
unless (fork) { ;} # child indicated by fork=0
|
; # otherwise fork=child process number for parent
|
The child program typically invokes exec which replaces child original by the argument of exec. Meanwhile parent should wait until this exec is complete and child has gone away.
|
unless (fork) {
-
exec("date"); # child process becomes date command sharing environment with parent
|
}
|
wait; # parent process waits until date is complete
|
The child process need not terminate naturally as with exec() and if child code was for instance
|
print FILEHANDLE @hugefile; # in parallel with parent
|
exit; # is required else child will continue with parents code whereas we wanted parent and child to work in parallel on separate jobs
|
The hash %SIG is used to define signal handlers (subroutines) used for various signals
|
The keys of %SIG are the UNIX names with first SIG removed. For instance, to set handler() as routine that will handle SIGINT interrupts do something like:
|
$SIG{'INT'} = 'handler';
|
sub handler { # First argument is signal name
-
my($sig) = @_;
-
print("Signal $sig received -- shutting down\n");
-
exit(0);
|
}
|
kill $signum, $child1, $child2; # sends interrupt $signum to process numbers stored in $child1 and $child2
|
$signum is NUMERICAL label (2 for SIGINT) and $child1,2 the child process number as returned by fork or open(PROCESSHANDLE,..) to parent
|
As in other interpreters (JavaScript, e.g.), PERL allows you to execute a string using the eval function
|
Suppose you had two arrays $fred[$index] and $jim[$index] and you wanted to give them a value of $index and an ascii string $name (which could be input) taking value 'fred' or 'jim'. This can be achieved by:
|
eval('$' . $name . '[' . $index . ']') = $value;
|
eval returns result of evaluating its argument
|
In this case, you can achieve the same results with indexed hashes:
|
$options[$index]{$name} = $value;
|
using multidimensional array notation
|
CGI.pm, a Perl5 module (by Lincoln Stein) used to write CGI scripts, is documented in Ch. 19 of Learning Perl (second edition). See the CGI.pm man page for details:
|
% man CGI # read CGI man page
|
CGI.pm is compatible with the Perl4 library cgi-lib.pl:
|
require "cgi-lib.pl";
|
&ReadParse; # initialize global hash %in
|
print "The value of 'foo' is $in{foo}.\n";
|
is equivalent to
|
use CGI qw(:cgi-lib);
|
&ReadParse; # initialize global hash %in
|
print "The value of 'foo' is $in{foo}.\n";
|
Other cgi-lib.pl functions available in CGI.pm:
|
PrintHeader() HtmlTop() HtmlBot()
|
SplitParam() MethGet() MethPost()
|
Groups of CGI.pm methods are loaded via import tags:
|
:cgi argument-handling methods such as param()
|
:form HTML form tags such as textfield()
|
:html2 all HTML 2.0 tags
|
:html3 all HTML 3.0 tags, including <TABLE>
|
:netscape Netscape HTML extensions
|
:shortcuts equivalent to qw(:html2 :html3 :netscape)
|
:standard equivalent to qw(:html2 :form :cgi)
|
:all all of the above
|
Examples:
|
use CGI; # must use object-oriented syntax!
|
use CGI qw(:standard);
|
use CGI qw(:standard :html3);
|
Here's an example how to use CGI.pm to process form data:
|
print header(-type=>'text/html'); # MIME header
|
print start_html( # first few lines of HTML
-
-title=>'Pizza Order Form',
-
-BGCOLOR=>'#ffffff'
-
);
|
print h1( 'Pizza Order' );
|
print h3( "$TOPPING pizza" ); # $TOPPING = param(topping);
|
print p( "Deliver to: <B>$ADDR</B>" ); # $ADDR = param(address);
|
print p( "Telephone: <B>$PHONE</B>" ); # $PHONE = param(phone);
|
my $date = `date`; chomp($date);
|
print p( "Order came in at $date" );
|
print hr();
|
# Print a link:
|
print 'Return to ';
|
print a({href=>"fill-out-form.pl"}, # an anonymous hash
|
print end_html(); # last few lines of HTML
|
Here's an example of a form generated by CGI.pm function calls:
|
print start_form( # <FORM> tag
-
-method=>'POST', # default
-
-action=>'fill-out-form.pl'
-
);
|
print p( "Type in your street address:\n",
-
textfield( # a textfield
-
-name=>'address', -size=>36
-
)
-
);
|
print p( 'What kind of pizza would you like?' );
|
print blockquote( # requires :shortcuts
-
radio_group(
-
-name=>'topping',
-
-values=>['Pepperoni','Sausage','Anchovy']
-
) # an anonymous array
-
);
|
print p( 'To place your order, click here:',
-
submit('Order Pizza'), # submit button
-
reset('Clear') # reset button
-
);
|
print end_form(); # </FORM> tag
|