Perl
From DocForge
Perl is a general-purpose programming language originally developed for text manipulation. Today it's used for a wide range of tasks including system administration, web development, network programming, GUI development, and others.
The Perl language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal). Its major features include support for multiple programming paradigms (procedural programming, object-oriented programming, and functional programming styles), automatic memory management, built-in support for text processing, and a large collection of third-party modules.
Contents |
[edit] Hello World
The simplest Hello World program in Perl:
#!/usr/bin/perl print "Hello, world!\n";
It is highly recommended that all programs use the warnings pragma of Perl, as well as turn strict mode on. This helps to alert the programmer to many potential sources of mistakes or unsafe operations. A small program with these features in Perl would be as follows:
#!/usr/bin/perl use strict; use warnings; print "Hello, world!\n";
[edit] Features
The overall structure of Perl derives broadly from C. Perl is procedural in nature, with variables, expressions, assignment statements, brace-delimited code blocks, control structures, and subroutines.
Perl also takes features from shell programming. All variables are marked with leading sigils, which unambiguously identify the data type (scalar, array, hash, etc.) of the variable in context. Importantly, sigils allow variables to be interpolated directly into strings. Perl has many built-in functions which provide tools often used in shell programming (though many of these tools are implemented by programs external to the shell) like sorting, and calling on system facilities.
Perl takes lists from Lisp, associative arrays (hashes) from AWK, and regular expressions from sed. These simplify and facilitate many parsing, text handling, and data management tasks.
In Perl 5, features were added that support complex data structures, first-class functions (i.e. closures as values), and an object-oriented programming model. These include references, packages, class-based method dispatch, and lexically scoped variables, along with compiler directives (for example, the strict pragma). A major additional feature introduced with Perl 5 was the ability to package code as reusable modules. Larry Wall later stated that "The whole intent of Perl 5's module system was to encourage the growth of Perl culture rather than the Perl core."
All versions of Perl do automatic data typing and memory management. The interpreter knows the type and storage requirements of every data object in the program; it allocates and frees storage for them as necessary. Legal type conversions are done automatically at run time; illegal type conversions are fatal errors.
[edit] Design
The design of Perl can be understood as a response to three broad trends in the computer industry: falling hardware costs, rising labor costs, and improvements in compiler technology. Many earlier computer languages, such as Fortran and C, were designed to make efficient use of expensive computer hardware. In contrast, Perl is designed to make efficient use of expensive computer programmers.
Perl has many features that ease the programmer's task at the expense of greater CPU and memory requirements. These include automatic memory management; dynamic typing; strings, lists, and hashes; regular expressions; introspection and an eval() function.
The design of Perl is very much informed by linguistic principles. Examples include Huffman coding (common constructions should be short), good end-weighting (the important information should come first), and a large collection of language primitives. Perl favors language constructs that are natural for humans to read and write, even where they complicate the Perl interpreter.
Perl syntax reflects the idea that "things that are different should look different". For example, scalars, arrays, and hashes have different leading sigils. Array indices and hash keys use different kinds of braces. Strings and regular expressions have different standard delimiters. This approach can be contrasted with languages like Lisp, where the same S-expression construct and basic syntax is used for many different purposes.
Perl does not enforce any particular programming paradigm (procedural, object-oriented, functional, etc.), or even require the programmer to choose among them.
There is a broad practical bent to both the Perl language and the community and culture that surround it. The preface to Programming Perl begins, "Perl is a language for getting your job done." One consequence of this is that Perl is not a tidy language. It includes features if people use them, tolerates exceptions to its rules, and employs heuristics to resolve syntactical ambiguities. Because of the forgiving nature of the compiler, bugs can be hard to find sometimes. Discussing the variant behaviour of built-in functions in list and scalar contexts, the perlfunc(1) manual page says "In general, they do what you want, unless you want consistency."
Perl has several mottos that convey aspects of its design and use. One is "There's more than one way to do it." (TIMTOWTDI, usually pronounced 'Tim Toady'). Others are "Perl: the Swiss Army Chainsaw of Programming Languages" and "No unnecessary limits". A stated design goal of Perl is to make easy tasks easy and difficult tasks possible. Perl has also been called "The Duct Tape of the Internet".
There is no written specification or standard for the Perl language, and no plans to create one for the current version of Perl. There has only ever been one implementation of the interpreter. That interpreter, together with its functional tests, stands as a de facto specification of the language.
[edit] Applications
Perl has many and varied applications, compounded by the availability of many standard and third-party modules.
Perl has been used since the early days of the Web to write CGI scripts. It is known as one of "the three Ps" (along with Python and PHP), the most popular dynamic languages for writing Web applications (which now also include Ruby). It is also an integral component of the popular LAMP solution stack for web development. Large projects written in Perl include Slash, IMDb and UseModWiki, an early, influential wiki engine. Many high-traffic websites, such as Amazon.com, Livejournal.com and Ticketmaster.com use Perl extensively.
Perl is often used as a glue language, tying together systems and interfaces that were not specifically designed to interoperate, and for "data munging", ie. converting or processing large amounts of data for tasks like creating reports. In fact, these strengths are intimately linked. The combination makes perl a popular all-purpose tool for system administrators, particularly as short programs can be entered and run on a single command line.
Perl is also widely used in finance and bioinformatics, where it is valued for rapid application development and deployment, and the ability to handle large data sets.
[edit] Implementation
Perl is implemented as a core interpreter, written in C, together with a large collection of modules, written in Perl and C. The interpreter is 150,000 lines of C code and compiles to a 1 MB executable on typical machine architectures. Alternatively, the interpreter can be compiled to a link library and embedded in other programs. There are nearly 500 modules in the distribution, comprising 200,000 lines of Perl and an additional 350,000 lines of C code. (Much of the C code in the modules consists of character encoding tables.)
The interpreter has an object-oriented architecture. All of the elements of the Perl language - scalars, arrays, hashes, coderefs, file handles - are represented in the interpreter by C structs. Operations on these structs are defined by a large collection of macros, typedefs and functions; these constitute the Perl C API. The Perl API can be bewildering to the uninitiated, but its entry points follow a consistent naming scheme, which provides guidance to those who use it.
The execution of a Perl program divides broadly into two phases: compile-time and run-time. At compile time, the interpreter parses the program text into a syntax tree. At run time, it executes the program by walking the tree. The text is parsed only once, and the syntax tree is subject to optimization before it is executed, so the execution phase is relatively efficient. Compile-time optimizations on the syntax tree include constant folding and context propagation, but peephole optimization is also performed. However, compile-time and run-time phases may nest: BEGIN code blocks execute at compile-time, while the eval function initiates compilation during runtime. Both operations are an implicit part of a number of others, most notably, the use clause that loads libraries, known in Perl as modules, implies a BEGIN block.
Perl has a context-sensitive grammar which can be affected by code executed during an intermittent run-time phase. Therefore Perl cannot be parsed by a straight Lex/Yacc lexer/parser combination. Instead, the interpreter implements its own lexer, which coordinates with a modified GNU bison parser to resolve ambiguities in the language. It is said that "only perl can parse Perl", meaning that only the Perl interpreter (perl) can parse the Perl language (Perl). The truth of this is attested to by the persistent imperfections of other programs that undertake to parse Perl, such as source code analyzers and auto-indenters, which have to contend not only with the many ways to express unambiguous syntactic constructs, but also the fact that Perl cannot be parsed in the general case without executing it.
Perl is distributed with some 120,000 functional tests. These run as part of the normal build process, and extensively exercise the interpreter and its core modules. Perl developers rely on the functional tests to ensure that changes to the interpreter do not introduce bugs; conversely, Perl users who see the interpreter pass its functional tests on their system can have a high degree of confidence that it is working properly.
Maintenance of the Perl interpreter has become increasingly difficult over the years. The code base has been in continuous development since 1994. The code has been optimized for performance at the expense of simplicity, clarity, and strong internal interfaces. New features have been added, yet virtually complete backward compatibility with earlier versions is maintained. The size and complexity of the interpreter is a barrier to developers who wish to work on it.
[edit] Language Structure
[edit] Variable Use in Perl
Variables are indicated with a leading non-alphanumeric symbol (known as a sigil) which indicates the variable's type. There are three types of Perl variables: scalars (indicated by $), arrays (indicated by @), and hashes (indicated by %).
When using the strict pragma, variables need to be declared using my operator. Failure to do so will raise errors flags and cause the script to be aborted at compile time. An expception to this rule would be by implementing vars pragma, which allows you to specify global variables. It comes in handy for variables that will live all the way to the end of the script.
An example using the vars pragma:
#!/usr/bin/perl use strict; use warnings; use vars qw($dog @griffins); $dog = 'Brian'; @griffins = qw(Peter Lois Stewie Meg Chris);
An example using the my function:
#!/usr/bin/perl use strict; use warnings; my $x = 1; $x = "Hello"; $x = 1.039; # Scalars can hold numbers or strings. my @array = (1, 2, 3); # Arrays can hold a number of scalars $x = $array[0]; foreach my $x (@array) { print "$x + "; # This will print '1 + 2 + 3 + ' } my %hash = ("name", "tester", "age", 3); # hashes are arrays with an even number of elements $x = $hash{'name'}; # $x will contain the 'value' of the key 'name', which is 'tester'. %hash = ('name' => 'tester', 'age' => 3); # For convenience, you can replace the commas with '=>' @array = keys %hash; # 'keys' will return the keys in the hash %hash. @array = values %hash; # 'values' will return the values in the hash %hash.
[edit] Differences between "string" and 'string'
In Perl, both "test" and 'test' will evaluate to a string containing four characters ('t', 'e', 's' and 't'). However, a double-quoted string is interpolated: a variable in a double-quoted string will be replaced with its value. A single quoted string will not be interpolated.
#!/usr/bin/perl use strict; use warnings; my $y = "the cat"; print "I like $y"; # This line will print 'I like the cat' print 'I like $y'; # This line will print 'I like $y'
Additionally, double-quoted strings can be written with the qq operator, which single-quoted strings can be written with the q operator.
$string = qq(I like $y); # Will evaluate to 'I like the cat' $string = q(I like $y); # Will evaluate to 'I like $y'
Both these methods to declare global variables, while using strict, are fine, although some programmers use the vars-pragma to declare the most important variables (some that will be used throughout almost the entire script) all at the same place, thus making them easier to remember and find.
[edit] Data types
Perl has a number of fundamental data types, the most commonly used and discussed being: scalars, arrays, hashes, filehandles and subroutines:
- A scalar is a single value; it may be a number, a string or a reference
- An array is an ordered collection of scalars
- A hash, or associative array, is a map from strings to scalars; the strings are called keys and the scalars are called values.
- A filehandle is a map to a file, device, or pipe which is open for reading, writing, or both.
- A subroutine is a piece of code that may be passed arguments, executed and returns data
Most variables are marked by a leading sigil, which identifies the data type being accessed (not the type of the variable itself), except filehandles, which don't have a sigil. The same name may be used for variables of different types, without conflict.
$foo # a [[scalar]] @foo # an [[array]] %foo # a [[hash]] FOO # a [[file handle]] &foo # a subroutine.
(Note: file handles need not be uppercase, but it is a common convention owing to the fact that there is no sigil to denote them.)
Numbers are written in the bare form; strings are enclosed by quotes of various kinds.
$n = 42; $name = "joe"; $color = 'red'; $animal = qq!frog!;
Perl will convert strings into numbers and vice versa depending on the context in which they are used. In the following example the strings $n and $m are treated as numbers when they are the arguments to the addition operator. This code prints the number '5', discarding non number information for the operation, although the variable values remain the same. (The string concatenation operator is not +, but .)
$n = "3 apples"; $m = "2 oranges"; print $n + $m;
Perl also has a boolean context that it uses in evaluating conditional statements. The following values all evaluate as false in Perl:
$false = 0; # the number zero $false = 0.0; # the number zero as a float $false = '0'; # the string zero $false = ""; # the empty string $false = undef; # the return value from undef
All other values are evaluated to true. This includes the odd self-describing string of "0 but true", which in fact is 0 as a number, but true when used as a boolean. (Any non-numeric string would also have this property, but this particular string is ignored by Perl with respect to numeric warnings.) A less explicit but more conceptually portable version of this string is '0E0' or '0e0', which does not rely on characters being evaluated as 0, as '0E0' is literally "zero times ten to the zeroth power."
Evaluated boolean expressions also return scalar values. Although the documentation does not promise which particular true or false is returned (and thus cannot be relied on), many boolean operators return 1 for true and the empty-string for false (which evaluates to zero in a numeric context). The defined() function tells if the variable has any value set. In the above examples defined($false) is true for every value except undef.
If a specifically 1 or 0 result (as in C) is needed, an explicit conversion is thought by some authors to be required:
my $real_result = $boolean_result ? 1 : 0;
However, an implicit conversion can be used instead:
my $real_result = $boolean_result + 0;
A list is written by listing its elements, separated by commas, and enclosed by parentheses where required by operator precedence.
@scores = (32, 45, 16, 5);
Or, then again, it can be written some other half dozen ways, at least:
@scores = qw(32 45 16 5); @scores = split /-/, '32-45-16-5'; push @scores, $_ for 32, 45, 16, 5;
A hash may be initialized from a list of key/value pairs.
%favorite = (joe => 'red', sam => 'blue');
Or it may simply be defined piece by piece:
$favourite{joe} = 'red'; $favourite{sam} = 'blue';
Individual elements of a list are accessed by providing a numerical index, in square brackets. Individual values in a hash are accessed by providing the corresponding key, in curly braces. The $ sigil identifies the accessed element as a scalar.
$scores[2] # an element of @scores $favorite{joe} # a value in %favorite
Multiple elements may be accessed by using the @ sigil instead (identifying the result as a list).
@scores[2, 3, 1] # three elements of @scores @favorite{'joe', 'sam'} # two values in %favorite
The number of elements in an array can be obtained by evaluating the array in scalar context or with the help of the $# sigil. The latter gives the index of the last element in the array, not the number of elements.
$count = @friends; $#friends # the index of the last element in @friends $#friends+1 # usually the number of elements in @friends # this is one more than $#friends because the first element is at # index 0, not 1. Unless the programmer reset this to a # different value, which most Perl manuals discourage.
There are a few functions that operate on entire hashes.
@names = keys %address; @addresses = values %address; 1 while ($name, $address) = each %address;
[edit] Control structures
Perl has several kinds of control structures.
It has block-oriented control structures, similar to those in the C and Java programming languages. Conditions are surrounded by parentheses, and controlled blocks are surrounded by braces:
label while ( cond ) { ... }
label while ( cond ) { ... } continue { ... }
label for ( init-expr ; cond-expr ; incr-expr ) { ... }
label foreach var ( list ) { ... }
label foreach var ( list ) { ... } continue { ... }
if ( cond ) { ... }
if ( cond ) { ... } else { ... }
if ( cond ) { ... } elsif ( cond ) { ... } else { ... }
Where only a single statement is being controlled, statement modifiers provide a lighter syntax:
statement if cond ; statement unless cond ; statement while cond ; statement until cond ; statement foreach list ;
Short-circuit logical operators are commonly used to affect control flow at the expression level:
expr and expr expr or expr
The flow control keywords next, last, return, and redo are expressions, so they can be used with short-circuit operators.
Perl also has two implicit looping constructs:
results = grep { ... } list
results = map { ... } list
grep returns all elements of list for which the controlled block evaluates to true. map evaluates the controlled block for each element of list and returns a list of the resulting values. These constructs enable a simple functional programming style.
There is no switch statement (multi-way branch) in Perl 5. The Perl documentation describes a half-dozen ways to achieve the same effect by using other control structures. There is a Switch module, however, which provides functionality modeled on the forthcoming Perl 6 re-design.
Perl includes a goto label statement, but it is rarely used. Situations where a goto is called for in other languages don't occur as often in Perl due to its breadth of flow control options.
There is also a goto &sub statement that performs a tail call. It terminates the current subroutine and immediately calls the specified sub. This is used in situations where a caller can perform more efficient stack management than Perl itself (typically because no change to the current stack is required), and in deep recursion tail calling can have substantial positive impact on performance because it avoids the overhead of scope/stack management on return.
[edit] Subroutines
Subroutines are defined with the sub keyword, and invoked simply by naming them. If the subroutine in question has not yet been declared, parentheses are required for proper parsing.
foo(); # parentheses required here...
sub foo { ... }
foo; # ... but not here
A list of arguments may be provided after the subroutine name. Arguments may be scalars, lists, or hashes.
foo $x, @y, %z;
The parameters to a subroutine need not be declared as to either number or type; in fact, they may vary from call to call. Arrays are expanded to their elements, hashes are expanded to a list of key/value pairs, and the whole lot is passed into the subroutine as one undifferentiated list of scalars.
Whatever arguments are passed are available to the subroutine in the special array @_. The elements of @_ are aliased to the actual arguments; changing an element of @_ changes the corresponding argument.
Elements of @_ may be accessed by subscripting it in the usual way.
$_[0], $_[1]
However, the resulting code can be difficult to read, and the parameters have pass-by-reference semantics, which may be undesirable.
One common idiom is to assign @_ to a list of named variables.
my($x, $y, $z) = @_;
This effects both mnemonic parameter names and pass-by-value semantics. The my keyword indicates that the following variables are lexically scoped to the containing block.
Another idiom is to shift parameters off of @_. This is especially common when the subroutine takes only one argument.
my $x = shift;
Subroutines may return values.
return 42, $x, @y, %z;
If the subroutine does not exit via a return statement, then it returns the last expression evaluated within the subroutine body. Arrays and hashes in the return value are expanded to lists of scalars, just as they are for arguments.
The returned expression is evaluated in the calling context of the subroutine; this can surprise the unwary.
sub list { (4, 5, 6) }
sub array { @x = (4, 5, 6); @x }
$x = list; # returns 6 - last element of list
$x = array; # returns 3 - number of elements in list
@x = list; # returns (4, 5, 6)
@x = array; # returns (4, 5, 6)
A subroutine can discover its calling context with the wantarray function.
sub either { wantarray ? (1, 2) : "Oranges" }
$x = either; # returns "Oranges"
@x = either; # returns (1, 2)
[edit] Regular expressions
The Perl language includes a specialized syntax for writing regular expressions (RE), and the interpreter contains an engine for matching strings to regular expressions. The regular expression engine uses a backtracking algorithm, extending its capabilities from simple pattern matching to string capture and substitution. The regular expression engine is derived from regex written by Henry Spencer.
The Perl regular expression syntax was originally taken from Unix Version 8 regular expressions. However, it diverged before the first release of Perl, and has since grown to include many more features. Other languages and applications are now adopting Perl compatible regular expressions over POSIX regular expressions including PHP, Ruby, Java, and the Apache HTTP server.
The m// (match) operator introduces a regular expression match. (The leading m may be omitted for brevity.) In the simplest case, an expression like
$x =~ m/abc/
evaluates to true if and only if the string $x matches the regular expression abc.
Portions of a regular expression may be enclosed in parentheses; corresponding portions of a matching string are captured. Captured strings are assigned to the sequential built-in variables $1, $2, $3, ..., and a list of captured strings is returned as the value of the match.
$x =~ m/a(.)c/; # capture the character between 'a' and 'c'
The s/// (substitute) operator specifies a search and replace operation:
$x =~ s/abc/aBc/; # upcase the b
Perl regular expressions can take modifiers. These are single-letter suffixes that modify the meaning of the expression:
$x =~ m/abc/i; # case-insensitive pattern match $x =~ s/abc/aBc/g; # global search and replace
Regular expressions can be dense and cryptic. This is because regular expression syntax is extremely compact, generally using single characters or character pairs to represent its operations. Perl provides some relief from this problem with the /x modifier, which allows programmers to place whitespace and comments inside regular expressions:
$x =~ m/a # match 'a'
. # match any character
c # match 'c'
/x;
One common use of regular expressions is to specify delimiters for the split function:
@words = split m/,/, $line;
The split function creates a list of the parts of the string separated by matches of the regular expression. In this example, a line is divided into a list of its comma-separated parts, and this list is then assigned to the @words array.
[edit] Database interfaces
Perl is widely favored for database applications. Its text handling facilities are good for generating SQL queries; arrays, hashes and automatic memory management make it easy to collect and process the returned data.
In early versions of Perl, database interfaces were created by relinking the interpreter with a client-side database library. This was somewhat clumsy; a particular problem was that the resulting perl executable was restricted to using just the one database interface that it was linked to. Also, relinking the interpreter was sufficiently difficult that it was only done for a few of the most important and widely used databases.
In Perl 5, database interfaces are implemented by Perl DBI modules. The DBI (Database Interface) module presents a single, database-independent interface to Perl applications, while the DBD:: (Database Driver) modules handle the details of accessing some 50 different databases. There are DBD:: drivers for most ANSI SQL databases.
[edit] Recommended reading
- [1] Learning Perl - written by Randal L. Schwartz and Tom Phoenix - great way to start. Many perl programmers started with this book.
- [2] Learning Perl Objects, References, and Modules - written by Randal L. Schwartz - When your scripts start to exceed 500 lines of code, this is a great book to continue with.
[edit] External links

