Given by Geoffrey C. Fox,Nancy McCracken,Tom Scavo at Computational Science for Information Age Course CPS616 on Sept 20 97. Foils prepared Sept 20 97
Outside Index
Summary of Material
See Perl Home Page http://www.perl.com/ for background information and resources such as manual! |
This foilset mainly extends the previous Perl Overview with a discussion of some key Perl5 capabilities. However, some topics may be advanced Perl4 features |
We give an initial summary of Perl5 changes and then discuss: |
Some old and new functions in Perl |
Regular expression enhancements |
New syntax, especially -> and => |
New subroutine calling and declaration syntax |
Hard (address) and soft (symbol table) references |
General data structures, including multidimensional arrays |
Object-oriented features: packages, classes, and methods |
Outside Index Summary of Material
Instructors: Geoffrey Fox, Wojtek Furmanski |
Updated: September 1997 |
Syracuse University |
111 College Place |
Syracuse |
New York 13244-4100 |
See Perl Home Page http://www.perl.com/ for background information and resources such as manual! |
This foilset mainly extends the previous Perl Overview with a discussion of some key Perl5 capabilities. However, some topics may be advanced Perl4 features |
We give an initial summary of Perl5 changes and then discuss: |
Some old and new functions in Perl |
Regular expression enhancements |
New syntax, especially -> and => |
New subroutine calling and declaration syntax |
Hard (address) and soft (symbol table) references |
General data structures, including multidimensional arrays |
Object-oriented features: packages, classes, and methods |
Arbitrary multidimensional arrays for list [] and hash {} sectors |
Modules allow convenient library structure |
Functional but slightly adhoc object-oriented class structure hacked onto existing Perl4 |
C/C++ can be called from Perl and vice versa |
tie/untie allow more general database interfaces |
AUTOLOAD allows arbitrary action for undefined subroutines |
Now subroutines can be predeclared before implementation |
Significant regular expression enhancements
|
Full support for pointers (references) |
Various useful new routines such as my, qq, qw, q, quotemeta, lc, uc, lcfirst, ucfirst |
Many new pragmas: |
(use English; use strict 'vars' 'refs' 'subs';) |
New operator => for specifying keyword-value pairs: |
%hash = ( 'key1', 'value1', 'key2', 'value2' ); |
# is equivalent to |
%hash = ( 'key' => 'value1', 'key2' => 'value2' ); |
New operator -> is dereferencing operator
|
In hash arrays, quotes are now optional if unambiguous, i.e. if couldn't be an expression |
$days{'Feb'} and $days{Feb} are the same! |
package fred; |
$var = 3.14159; # defines fred to be a package and var to be a variable in this package so that variables following this statement should be accessed by |
$globalaccess = $fred::var; |
Packages can be nested |
package fred; |
.................... |
package jim; |
$var = 3.14159; # and now we use syntax |
$globalaccess = $fred::jim::var; |
Modules are packages used as libraries |
To reference package HTML::FormatPS |
Typically you define a file called |
FormatPS.pm -- note .pm NOT .pl -- in directory HTML |
Note use of UNIX directory structure and file names to support logical object structure of software -- we saw this quaint (but universal) convention when discussing Java |
The file FormatPS.pm module starts with following lines |
package HTML::FormatPS; |
$DEFAULT_PAGESIZE = "A4"; # A Package variable |
# Followed by (old) Perl4 pod (plain old documentation) syntax for automatic generation of documentation |
=head1 NAME This and following lines are JUST documentation and ignored by the interpreter |
This is title of documentation |
HTML::FormatPS - Format HTML as postscript |
# Continued on the next foil ...... |
=head1 SYNOPSIS |
require HTML::FormatPS; # This part of documentation defines use of Module |
$html = HTML::Parse::parse_htmlfile("test.html"); # access function parse_htmlfile in file Parse.pm in directory HTML |
Now define an object of class FormatPS which holds parameters of relevance -- $formatter will hold pointer to new object |
$formatter = new HTML::FormatPS
|
print $formatter->format($html); # run format method in class FormatPS to produce postscript output |
The start of pod information is recognized by a command starting with = in column1: |
We give HTML approximate equivalents to give intuition! |
=head1 heading Roughly equivalent to <h1> heading </h1> |
=head2 heading Roughly equivalent to <h2> heading </h2> |
=over N Indent by N characters |
=item text Roughly <li> text |
=back Roughly </ul> |
=cut End Pod Sequence |
One uses I<text> to get italic i.e. <i>text<i> in HTML |
B<text> for Bold, L<text> for link etc. |
See perlpod manual page and look at Perl library code which uses pod notation to generate manual |
require Cwd; # Makes notation Cwd:: accessible |
$here = Cwd::getcwd(); # correctly accesses function getcwd in module Cwd |
$here = getcwd(); # looks for getcwd in current file and probably fails! |
On the other hand: |
use Cwd; # Actually imports names (symbol table) from Cwd and |
$here = getcwd(); # is equivalent to $here = Cwd::getcwd(); |
Can use require with Perl programs -- not packages
|
Perl5 has a rich and at times rather confusing syntax for references or pointers |
This new feature is used to allow Perl variables to hold "handles" to objects and so implement an object-oriented environment. |
So in this sense pointers in Perl combine and do not "properly" distinguish classic pointers and objects. |
There are hard references and soft or symbol table references |
Hard references are new to Perl5 and much more powerful |
One of Perl's "problems" (also its strength if you are knowledgeable) is that one often needs to understand implementation issues to use effectively |
Every package has a symbol table (i.e. a list of used symbols) called :: so that main symbol table is |
%main:: and variable $var in main has symbol table entry $main::{'var'} |
*var is equivalent to $main::{'var'} |
The symbol $original exists, we can set |
*var = *original; # and then $var is another 'name' for $original and @var is another name for @original, etc.
|
We can more miraculously set |
$name="foo"; # define an innocent ascii string |
${$name} = 6; # sets $foo=6 as though $name was a symbolic reference |
$$name = 6; # also sets $foo=6 |
$name->[0] = 4; # sets $foo[0] = 4 |
${$name x 2} = 6; # sets $foofoo = 6 ; # remember definition of x for strings |
@$name = (); # sets @foo to null list while |
&$name(arguments); # calls subroutine foo with given arguments! |
use strict 'refs'; # FORBIDS symbolic references and above syntax will lead to error messages |
*PI =\3.14159; # ensures that $PI is set in a way that you can not override it!
|
Hard references are more powerful than typeglobs (symbolic references) and in some cases supersede them |
$scalarref = \$foo; # is pointer to a scalar |
$getit = $$scalarref; # is same as $getit = $foo |
This is called dereferencing or going from pointer (reference) to value |
$arrayref = \@array; |
$hashref = \%hash;
|
$coderef = \&subroutine; # pointer to a subroutine!
|
$globref = \*STDOUT;
|
Note in dereferencing, one can use curly braces {} either to disambiguate or to change scalar holding hard reference to a BLOCK returning a reference of correct type |
Thus $$scalarref is equivalent to ${$scalarref}; |
$$hashref{"key"} is equivalent to ${$hashref}{"key"} or $hashref->{"key"} |
Often one wishes to construct "unnamed" data structures or subroutines where one keeps track of them by reference as opposed to name |
This is natural with subroutines which return either a data structure or subroutine |
$arrayref = [1, 2, ['a', 'b', 'c'] ]; # $arrayref is a hard reference to a 2D array with 5 defined elements
|
$secretsub = sub { print "Support ECS\n" }; |
executed by &$secretsub; |
Read Chapter 4 of Camel Book (second edition) or manual pages on PerlLOL (List of Lists) and PerlDSC (Data Structures Cookbook) for many excellent examples |
$LoL3D[$x][$y][$z] = scalar func($x,$y,$z); # scalar forces scalar context |
The above is a classic (Fortran-like) 3D array except it need NOT be predefined, there are no dimensions, and Perl arrays can be ragged with $x=1 having different $y, $z ranges from $x=2, etc
|
We can also define $indexed2Dhash[$x][$y]{$z} which should be thought of as a hash labelled by two-dimensional indices |
$hashof2Darray{$x}[$y][$z] should be thought of as a hash whose value is a 2D array |
One can freely use such data structures as long as you use "full" number of indices |
Issues that require understanding of implementation occur when you need to manipulate structure "as a whole" with less than full number of indices |
All multi-dimensional data structures are implemented as arrays of references |
for $i (1..10) { |
@list = somefunc($i); # grab a list labelled by $i |
# Compute the number of elements in @list: |
$LoL[$i] = scalar @list; |
# Create a fresh 2D array for each $i: |
$LoL2D[$i] = [ @list ]; # use array constructor [ ] |
} # End for loop |
my(@list) = somefunc($i); # my() creates a fresh instance each time |
$LoL2D[$i] = \@list; # also works but is perhaps less clear |
Note my() can occur inside any block { } (not just at start of subroutine) and defines variables local to the block |
The line |
$LoL2D[$i] = \@list; |
also creates a 2D array, but \@list is same location each time and so $LoL2D[$i][$j] gives the same answer (i.e., the final @list returned) regardless of the value of $i |
In $Lol2D[$x][$y] one stores an array labelled by $x of hard references |
Each hard reference is to an anonymous 1D array whose elements are accessed by $y |
$LoL2D[$i][$j] can be written equivalently as$LoL2D[$i]->[$j] but |
NOT $LoL2D->[$i]->[$j] or $LoL2D->[$i][$j] |
as left hand side of -> MUST be a reference and NOT an array or hash |
$ref_to_LoL2D = \@LoL2D; # is allowed and now |
access by $ref_to_LoL2D->[$i][$j] or $ref_to_LoL2D->[$i]->[$j] |
Note [ .. ] or { .. } create anonymous arrays or hashs respectively which can be assigned to a reference and then dereferenced by -> |
( .. ) constructs a list which can be assigned to an Array or Hash |
@LoL2D = ( [1,2], [1,2,3] ); # Constructs a 2D array |
$ref_to_LoL2D = [ [1,2], [1,2,3] ]; # creates a pointer to a 2D array |
$arraypt =\@{$LoL2D[$i]}; # extracts a slice ($i'th row) from $LoL2D[$i][$j] |
$$arraypt->[$j] is equivalent to $LoL2D[$i][$j] |
Perl5 is operationally like Fortran and acts as though right most elements are least significant and stored "consecutively" |
If one has defined attributes for students with components such as $student{"grade"}, then $student[$classmember]{"grade"} is way to address a class of students |
Subroutines must typically be predefined (with new sub command) if they are to be accessed with |
subname(list); # or one can use |
&subname(); # as equivalent to subname() |
so that & notation is typically unnecessary |
use packagename qw( NAME1 NAME2 NAME3); # imports routines NAME1 NAME2 NAME3 from package packagename |
Notice qw() is new Perl5 routine to generate quotes around space separated words
|
One can predeclare subroutines with |
sub name; # may be used before implementation appears |
Subroutines may be defined anonymously: |
sub newprint { |
my $x = shift; |
# return anonymous subroutine: |
return { my $y=shift; print "$x $y!\n"; }; |
} |
$h = newprint("Howdy"); # store anonymous subroutine |
&$h("World"); # call anonymous subroutine, which prints |
"Howdy World!" |
Note the $x in anonymous subroutine is private and retains value "Howdy" in $h even when newprint is called again: |
$g = newprint("Hello"); # $g has separate instance of $x |
Note differences between my() and local() |
my($x); # declares $x to be private to the block |
local($x); # declares $x to be known to this block and all routines invoked within the block |
Typeglob or symbolic reference can be used to pass arguments by reference and not by value |
This has usual advantage that subroutine alters "global" and not a "local" copy -- especially relevant for complex data structures where you do not want expense of copying
|
sub doublearray { |
my(*arraypointer) = @_; |
foreach $elem (@arraypointer) {
|
} |
} # End routine to double elements of an array |
# Suppose @foo and @bar are arrays: |
doublearray(*foo); # doubles elements of @foo |
doublearray(*bar); # doubles elements of @bar |
There is a well-known problem with ordinary ways of using Perl subroutine arguments |
If one has a argument list such as |
(@list1, @list2, .... ), then the subroutine sees a single list (array), which is the concatenation of the component lists |
This can be avoided using hard references with the \ operator. For example: |
@tailings = popmany( \@a, \@b, \@c, \@d ); |
See next foil for code of popmany |
sub popmany { # See previous foil for use |
my $aref; # A local scalar to hold pointer to array |
my @retlist = (); # An array to hold returned list |
# Pop last element in each input array: |
foreach $aref ( @_ ) { # loop over arguments
|
} |
# @retlist holds last element of each array passed |
return @retlist; |
} |
One can define a default function AUTOLOAD to resolve unsatisfied subroutine references in a given (set of) packages |
You set up AUTOLOAD to deal with this case in whatever way you want! |
AUTOLOAD is passed arguments that were passed to called subroutine and name of unsatisfied external is in variable $AUTOLOAD |
sub AUTOLOAD { # Call UNIX for unsatisfied externals |
my $program = $AUTOLOAD; |
$program =~ s/.*:://; # remove any package precursors |
system($program, @_); |
} |
date(); # will be executed correctly by above AUTOLOAD |
The object model in Perl5 is not as clear as in Java as the concepts are mixed up with the implementation. |
We see same flaws in JavaScript where we "violate" modular programming principles by mixing concept and implementation in the technology which is precisely designed to help programmer keep these separate in his or her own programs! |
Objects are references -- not directly variables -- they are typically references returned from subroutines as anonymous datastructures
|
Objects are further:
|
A class method is a conventional Perl5 subroutine defined in given class (package) which expects its first argument to be either an object reference or for static methods (independent of object instance) the class name |
The class name IS the package name |
$formatter = new(HTML::FormatPS , FontFamily => 'Helvetica', PaperSize => 'Letter'); # create instance of class given in first argument with following arguments overriding default paramters |
$formatter holds a reference to a blessed hash |
remember => is just a comma and arguments to a Perl subroutine are just a single list -- here of 5 entities |
In package HTML::FormatPS subroutine new looks like this (continued on next foil) |
sub new {
|
# Set up defaults in hash $self which is blessed |
my $self = bless {
|
}, $class; # second argument to bless is class name |
$self->papersize($DEFAULT_PAGESIZE); # To be Continued on next foil |
# Parse constructor arguments (might override defaults) |
while (($key, $val) = splice(@_, 0, 2)) { # get in $key,$value next two elements from @_ and remove them from @_
|
} |
return($self); # return datastructure |
} |
splice ARRAY,OFFSET,LENGTH,LIST |
remove LENGTH elements starting at position OFFSET in ARRAY and replace by elements ( if any) in LIST |
shift(@a) is equivalent to splice(@a,0,1) |
# We show this example of $PaperSizes{}[] which is a Hash whose key references a 2D array |
%PaperSizes = |
( |
A3 => [mm(297), mm(420)], # mm is built in subroutine |
A4 => [mm(210), mm(297)], # to convert millimeteres to points |
); |
so $PaperSizes{'A4'} returns a two dimensional array |
$self is of course pointer to datastructure and so following code will alter object passed in first argument |
sub papersize {
|
} |
This is implemented "by hand" using @ISA which is defined for every package and contains list of packages to be searched for unsatisfied externals |
package Fred; |
require Exporter; # Make package Exporter available to Fred |
@ISA = qw(Exporter); # Exporter is to be searched for unsatisfied externals |
# See Exporter manual page for more details |
Of course AUTOLOAD mechanism kicks in as technique of last resort if cannot find a subroutine anywhere else |
There are the original cryptic two character names and those with a more mnemonic value which are accessible if one invokes
|
Here are a few examples -- we gave some of the predefined variables defining formatted output in first Perl foilset |
$ARG or $_ is default name when nothing specified
|
A match m/regexp/; or equivalent s pattern match sets |
$MATCH or $& -- The matched string |
$PREMATCH or $` to be string before matched string |
$POSTMATCH or $' to be string after matched string |
$LAST_PAREN_MATCH or $+ contains material in last paranthesis matched
|
q(string) or qDstringD for any delimiter D -- interprets string as a literal
|
q(string) is equivalent to 'string' except works even if unprotected ' in string |
qq(string) or qqDstringD is similarily equivalent to "string" except you do NOT need to protect " inside it
|
qx(string) is similarily equivalent to `string` --
|
quotemeta("string") protects all regular expression metacharacters
|
quotemeta("string") is equivalent to "\Qstring\E" where "string" is interpolated before protection |
(?#comment) is a comment in a regular expression
|
/x modifier interprets whitespace in regular expression as for readability and not as "real" characters -- the Perl manual gives an example shown below which removes /* .. */ from C programs |
$program =~ s{
|
}[]gsx; # Replace with nothing (i.e. remove) and specify modifiers g s and x to be operational |
modifier g ensures we match and remove all /* .. */ strings |
modifier s treats newlines as part of string |
modifier x means that whitespace ignored and # treated like a comment in conventional Perl |
Note use of syntax s{old}[new] as equivalent to s%old%new% |
One can use any types of parantheses such as s(old){new} etc. |
The default pattern matching in Perl is greedy or maximal size matching |
There is now the ? option to designate the selection of match of minimum size |
* is replaced by *? to specify minimal match 0 or more times |
+? represents minimal match 1 time |
?? represents minimal match 0 or 1 times |
{n}? minimal match exactly n times |
{n,}? minimal match at least n times |
{n,m}? minimal match At least n but not more than m times |
(?:regexp) means a simple grouping of regexp as a unit -- equivalent to (regexp) except it does not generate a $n reference |
regexp1(?=regexp2) matches to regexp1 followed by regexp2 but regexp2 is not considered part of match
|
regexp1(?!regexp2) matches to regexp1 NOT followed by regexp2 |
(?i) (?m) (?imsx) etc are equivalent to specifying modifiers i, m or i and m and s and x respectively |
do "filename.pl"; or more generally do EXPR; # where EXPR returns a string which is taken as a filename |
Perl executes contents of file specified by string |
This is a good way of loading in a block of subroutines |
glob EXPR returns the value of EXPR with conventional UNIX shell filename expansion
|
We have learnt about \L (lower case characters until \E) and \l (lower case next character) and corresponding upper case \U \u so that
|
There are a set of function calls implementing these so that |
lc(STRING) converts STRING to lower case |
lcfirst(STRING) converts first character in STRING to lower case |
uc and ucfirst play same role for upper case |
ucfirst(lc 'fOX') returns 'Fox' |
defined(expr); # where expr is typically a variable such as $list[7] |
returns true if expr is defined (i.e. not equal to undef) |
undef $scalar; undef @list; undef %hash; # set all elements in passed reference to be undefined -- the argument can also be things like $hash{key};
|
undef can be very useful -- for instance you may wish to reuse a hash %parms and execuate undef %parms before re-use. |
exists($hash{place}); returns true if place has been defined as a key to %hash-- note this tests existence of associative memory key -- the value $hash{place} may still be undefined! |
@chars = map( expr , @nums); |
Here expr is some Expression or Block which accesses variable $_ |
map sets $_ to be succesive values in list @nums and returns the successive results of executing BLOCK expr with each value of $_
|
A similar construct is grep which acts like map but returns a list containing just the entries in @nums for which expr is TRUE
|
These are generalized tr like functions which convert a list to a string (pack) or string to list/array (unpack) according to a template |
We can illustrate with a fragment from a CGI script that reads and interprets data sent to a server in the peculiar application/x-www-form-urlencoded coding scheme |
$value =~ tr/+/ /; # Convert + back to blanks |
$value =~ /%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))//eg; |
The last line finds encoded hex characters %XY with X,Y in set A-F,a-f,0-9 and replaces them by the "real" character representation
|
General syntax is pack(TEMPLATE,LIST) where TEMPLATE is a string of identifiers specifying output style of successive characters |
(A is Ascii, l signed long, p pointer, f Float etc.) |
ref EXPR returns FALSE unless EXPR is a reference (pointer) |
If it is a reference, then ref returns
|
scalar EXPR forces EXPR to be evaluated in scalar (as opposed to list) context and returns scalar result |
We do not need a "list" command as [ ] constructs anonymous arrays and ( ... ) generates a list trivially |
tie() and untie() are described in PerlTIE Man Page |
These generate "enchanted" variables ( magical blessing!) which allows one to access what seems to be an ordinary variables in a Perl Program but behind the scenes, the implementation of variable dioes a lot of work which could include computation, data access etc. |
Examples in PerlTIE include: |
tie a scalar $priority which returns process or user priority by accessing UNIX system |
tie a hash to a database so that |
$tiedhash{lookupkey} returns value of lookupkey in database |
This latter tie is most powerful and could involve SQL access to the database sever when tied hash accessed |
tie VARIABLE, CLASSNAME, LIST |
VARIABLE is a scalar, array or hash to be enchanted |
Note a given type of tie can have several variables tied which differ by their initial conditions which are specified in LIST which is handed to constructor |
The CLASSNAME is a module which must have some special ENCHANTED routines defined -- these are |
constructors TIESCALAR() TIEARRAY() or TIEHASH() for three variable types respectively and further functions to define operational access to variables which are listed on following foil |
FETCH -- get variable value |
STORE -- store variable value |
DESTROY -- destroy variable |
Tied hashes also should provide
|
Note User provides these functions as they depend on particular behind the scenes manipulations |
Note standard classes provide ability to specify user routines BEGIN and END to be run at beginning and end of invocation of a class |
One can interface Perl5 code with C using an interface constructor called xsubpp |
This manipulates an existing C library and allows it to be accessed through a designated Perl module |
Simple datatypes can be handled automatically but user must manipulate complex C datastructures in special xsub code |
see PerlXS and PerlXSTUT manual entries |
Also PerlCALL and PerlEMBED manual pages describe how to call Perl from C |
One can add a Perl Interpreter to your C program, execute a Perl statement such as a pattern match or execute a full Perl subroutine |