SJCC CIS 24: Class 5 (10/9) Lecture Notes

CIS 24: CGI and Perl Programming for the Web

Class 5 (10/9) Lecture Notes

Topics

Review of Last Week's Lab Assignment
More Pattern Matching Metacharacters
Pattern Sequence Scalar Variables
Pattern Matching Options
The Substitution Operator
The Translation Operator
Hashes
The %ENV System Hash
More on Arrays
Lab: Exercises

Return to CIS 24 home page

Review of Last Week's Lab Assignment

Write a script that:
- Prints the message "Ask me a question politely"
- Accepts input from the user and assigns it to the variable $question
- If $question contains the word "please", print the message "Thank you for being polite", otherwise print the message "That was note very polite"
```
print "Ask Me a questions politely: ";
chomp($question = <STDIN>);
if ($question =~ /please/) {
	print "Thank you for being polite\n";
}

else {
	print "That was not very polite\n";
}
```
Write a script that accepts a line of input from the user. Your script should then print the number of words that the user entered. You can do this by using the split function to split the line of input on the whitespace between the words. Your script must be able to handle multiple whitespace characters in a row (i.e. spaces or tabs).
```
print "please type some words: ";
chomp($line = <STDIN>);
@words = split(/[ \t]+/, $line);
$wordcount = @words;
print "Total number of words: $wordcount\n";
```
Write a script that has the user enter the name of a scalar or array variable. Your script should check the variable name to make sure it is valid, and then print a message accordingly. That is, your script should check to make sure the first character is a $ or @, that the second character is a letter or underscore, and that all of the subsequent characters are letters, underscores, or numbers. Remember that both capital and lower case letters are acceptable in variable names.
```
print "Enter a variable name: ";
chomp($varname = <STDIN>);
if ($varname =~ /[\$@][_a-zA-Z]\w*/) {
	print "$varname is a valid variable name\n";
}

else {
	print "$varname is not a valid variable name\n";
}
```

More Pattern Matching Metacharacters

Word Boundary Pattern Anchors: \b and \B
\b specifies whether a matched pattern must fall on a word boundary. For example:
```
$string = "Mike 59 hello";
$string =~ /\b59/;
print "the matched pattern in \$string: $&\n";
```
\B matches a pattern only if it is within a word:
```
$string = "Mike 59 hello";
$string =~ /\Bike/;
print "the matched pattern in \$string: $&\n";
```
A "word" is assumed to contain letters, digits, and underscore characters, and nothing else. A word boundary is defined as anything that is not a word character.
Wildcard: the . Metacharacter
The period (.) special character will match any character except a newline. It is basically Perl's "wildcard" character for pattern matching. For example:
```
$string = "Mike 59 hello";
$string =~ /M.ke/;
print "the matched pattern in \$string: $&\n";
```
The . is often used in conjunction with the * character. For example:
```
$string = "Mike 59 hello";
$string =~ /M.*9/;
print "the matched pattern in \$string: $&\n";
```
Specifying Choices: the | Metacharacter
The "pipe" (|) special character allows you to specify two or more alternatives to choose from when matching a pattern. For example, if you had an HTML input form where you wanted to allow users to enter either a 5 digit zip code, or a 5 digit +4 zip code:
```
$string = "95030-1512";
$string =~ /^\d{5}$|^\d{5}-\d{4}$/;
print "the matched pattern in \$string: $&\n";
```

Pattern Sequence Scalar Variables

So far we've used $& to refer to the most recently matched pattern. Perl allows you to use parentheses in pattern matching to perform multiple matches in a single operation. For example:

$string = "This string contains the number 11.87";
$string =~ /(\d+)\.(\d+)/;
print "\$1 - the first pattern matched: $1\n";
print "\$2 - the second pattern matched: $2\n";

$1 refers to the first pattern matched (starting from the left-hand side of the string), $2 refers to the second pattern matched, and so on.

Note that if you perform a subsequent pattern matching operation, the values of $1, $2, etc. are overwritten by the values from the new pattern match. If you need these values, assign them to other variables immediately after performing the pattern match.

Pattern Matching Options

When you specify a pattern, you also can supply options that control how the pattern is to be matched. There are three pattern matching options:

g - match the pattern as many times as possible
i - ignore case
o - evaluate the pattern only once

As you've seen, a pattern match operation ceases as soon as the pattern match specifications are satisfied once. You can use the "g" option to instead try to match the pattern as many times as possible:

$string = "Mike 59 hello";
@matches = $string =~ /.e/g;
print "@matches";

This is very handy if you want to find every occurrence of a pattern in a string. Also note the change in behavior this causes for the pattern match operator. Without the g option, the pattern match returns true or false (hence our reliance on $& to see the matched pattern). With the g option, the pattern match returns an array of the matched patterns.

The "i" option is straightforward - it ignores case when trying to match a pattern:

$string = "Mike 59 hello";
$string =~ /mIKe/i;
print "the matched pattern in \$string: $&\n";

The circumstances where the "o" option is useful are quite infrequent, so you won't see this used very much. It forces Perl to ignore changes in a pattern that you're trying to match. For example:

$count = 1;
while ($count < 5) {
	print "\nenter a number: ";
	$input = <STDIN>;
	$result = $input =~ /$count/o;
	print "your input: $input";
	print "the match: $result\n";
	$count++;
}

The Substitution Operator

So far we've only been able to find patterns in strings. With the substitution operator s/// you can also replace patterns in strings. For example:

$string = "Mike 59 hello";
$string =~ s/59/72/;
print "the new value of \$string: $string\n";

This searches for the pattern "59" and if it's found, Perl replaces it with the pattern "72". You can also use it to effectively delete a pattern from a string, like this:

$string = "Mike 59 hello";
$string =~ s/59 //;
print "the new value of \$string: $string\n";

The substitution operator supports the g, i, and o pattern matching options.

Here's an example of using the "g" option with the substitution operator. This also illustrates another concept - you can use a variable as a pattern to match.

$string = "Mike 59 hello";
$string =~ s/.(.e)/$1/g;
print "$string";

The substitution operator supports an additional option: "e", which allows you to have your replacement for a pattern be an expression rather than a string:

$string = "Mike 59 hello";
$string =~ s/59/$& * 2/e;
print "the new value of \$string: $string\n";

Here's an example illustrating the usefulness of the substitution operator with CGI programming. If you have a file or string containing email addresses, and you want to not only print them to a page, but also add HTML hyperlinks them, use the substitution operator:

$string = 'A string containing email addresses: mike@toppa.com joe@smith.com';
$string =~ s/[\w-]+\@[\w-]+\.\w+/<A HREF="mailto:$&">$&<\/A>/g; 
print "the new value of \$string:\n$string\n";

As you can see in this example, I also used the "g" option, which means Perl looks for every email address in the string and tries to replace it accordingly (and since $& holds the value of the last pattern matched, it always substitutes the correct email address).

The Translation Operator

The translation operator is similar to, but definitely distinct from, the substitution operator. Note that it is not covered in the Pattern Matching chapter of your book, but it is discussed in Chapter 9. This is an important one, as we will use it to decode HTML form submissions in a couple of weeks. Like the substitution operator, it's used to replace patterns in strings:

$string = "Mike 59 hello";
$string =~ tr/59/72/;
print "the new value of \$string: $string\n";

However, it's different in that it performs a character-for-character translation of the characters in the matched pattern with the characters in the replacement string. Your replacement string should have the same number of characters as the pattern you're trying to match. If it doesn't, this is what happens:

$string = "Mike 59 Mello";
$string =~ tr/Mike/Jo/;
print "the new value of \$string: $string\n";

The translation operator does not support the options available to the substitution operator. Instead, it supports these three options:

c - "complement" - translates all characters not specified
d - "delete" - deletes all specified characters
s - "squeeze" - replaces multiple identical output characters with a single character

Some examples:

This replaces each character that is not an "e" with an "x".

$string = "Mike 59 hello";
$string =~ tr/e/x/c;
print "the new value of \$string: $string\n";

The "d" option deletes every specified character:

$string = "Mike 59 hello";
$string =~ tr/Me//d;
print "the new value of \$string: $string\n";

The "s" option avoids the multiple character output like we saw in the "Jo" example:

$string = "Mike 59 Mello";
$string =~ tr/Mike/Jo/s;
print "the new value of \$string: $string\n";

Two additional things to note about the translation operator:

The special characters for pattern matching - *, ?, /d, etc. - are not supported.
You can use the letter "y" in place of "tr" if you like.

Hashes

A significant limitation of arrays is that you can only directly access a list element if you know its index position. For example, if "mike" is in a list, and I want to access it, I either need to already know its position in the index, or I have to loop through every list element to find it.
Hashes (also known as associative arrays) provide a solution to this problem. Instead of a numeric index, you can assign whatever values you want to create your index. For example, instead of finding "mike" in an ordinary index by having to know that its at index position 5 in the array, I could use a hash and instead get the value for the index "firstname".
Hashes consist of key/value pairs. In this example the key is "firstname" and the value is "mike". You can have as many key/value pairs as you like in an associative array.
Hashes are fundamental to CGI programming. When you receive data from an HTML form submission, it arrives as a set of key/value pairs in a hash - more on this in the next two classes.
Associative array syntax: associative arrays are distinguished from regular arrays by a leading % instead of an @. You can create an associate array by listing key/value pairs in sequence. For example, here's an associative array of the number of fruits I have in my kitchen (3 apples, 2 bananas, and 4 oranges):
```
%food = ('apple', 'fruit', 'pear', 'fruit', 'carrot', 'vegetables');
```
A more intuitive method is to use the comma-arrow operator => (it's called this because it works like a comma but looks like an arrow).
```
%food = ('apple' => 'fruit',
	'pear' => 'fruit',
	'carrot' => 'vegetables');
```
Accessing and manipulating hash elements: you can refer to a specific key/value pair like this (this will print the word "fruit", indicating that an apple is a fruit):
```
print $food{'apple'};
```
Similarly, you can change the value of a hash element like this:
```
$food{'apple'} = "vegetables";
print $food{'apple'};
```
You can add an element to a hash simply by assigning it:
```
$food{'corn'} = "vegetables";
print $food{'corn'};
```
You can delete an element with the special hash function delete (this removes the key and its value):
```
delete($food{'apple'});
```
Listing array keys and values (and their order)
You can extract all the keys from a hash with the keys function. It returns a list consisting of all the keys:
```
@foodkeys = sort(keys(%food));
print "@foodkeys\n";
```
You'll notice that the list in @foodkeys is not in the same order we used when creating %food. This is because Perl stores the key/value pairs in an order that is most efficient in regard to the computer's memory usage. As far as we're concerned, the order is random. You can use sort to put the keys in alphabetical order:
```
@foodkeys = sort(keys(%food));
print "@foodkeys\n";
```
You can extract all the values from a hash with the values function. It returns a list consisting of all the values:
```
@foodvalues = values(%food);
print "@foodvalues\n";
```
Keys and values are not usually very interesting on their own. Typically you'll want to print a list that shows you the keys and their corresponding values. Use a while loop in conjunction with the each function. Successive calls to each result in a key/value pair being returned from the hash, until every key/value pair has been returned once.
```
while (($example, $foodtype) = each(%food)) {
	print "$example: $foodtype\n";
}
```
You can also use a foreach loop. This is less efficient than the while loop, as each key has to be looked up twice, but it allows you to use sort to alphabetize your output:
```
foreach $x (sort(keys(%food))) {
	print "$x: $food{$x}\n";
}
```

The %ENV System Hash

Every program that runs on your system has a special set of variables associated with it called environment variables. In Perl, you can access these environment variables by referring to the special system hash %ENV. You can use this short script to display the environment variables associated with your MS-DOS window:

while (($key, $value) = each(%ENV)) {
	print "$key: $value\n";
}

As we'll see in a class next month, this allows you to access a gold mine of information with CGI applications. With a CGI script, %ENV contains the environment variables for your web server, since it is the application that is running your script. You can use it to get information about the server, such as the server's hostname, IP address, and the name of your web server program. Most interestingly, you can access information about the user that activated your script: the hostname and IP address of their computer, what page they linked from to reach your script, and the name and version number of their browser.

More on Arrays

shift - a common task in working with arrays is to remove the first element of a list. For example, you'll often need to read the lines of a file into a Perl array. If the file was created in a spreadsheet, the first line may contain header information that you don't want in your array. After you've read the file into an array, shift gives you an easy way to remove that first line from your array. The remaining list elements are all then shifted one position to the left. An example:
```
@mylist = (1,2,3);
$firstval = shift(@mylist);
print "$firstval\n";
print "$mylist[0]\n";
```
The return value of shift, which here is assigned to $firstval, is the excised list element.
unshift - is the opposite of shift - it adds elements to the front of a list. However, unlike shift, it allows you to affect more than one element at a time. For example:
```
@mylist = (1,2,3);
$count = unshift(@mylist, "mike", "toppa");
print "contents of \@mylist: @mylist\n";
print "number of elements in \@mylist: $count\n";
```
unshift returns the number of elements that exist in the revised list. unshift is also a useful function. For example, if you have an array that you want to write to a file for later use in a spreadsheet, you may want to add a line of header information as the first element of the list. unshift makes it easy for you to do this.
push - is probably the function you will use most frequently with arrays. push adds an element to the end of a list (this is generally preferable to unshift for adding elements to a list, as it requires less internal recalculation of the array by Perl to simply add an element to the end). Like unshift, it returns the number of elements in the resulting array. An example:
```
@mylist = (1,2,3);
$count = push(@mylist, "mike");
print "contents of \@mylist: @mylist\n";
print "number of elements in \@mylist: $count\n";
```
push will definitely be useful in your programming tasks. For example, if you have a CGI script that performs several operations (e.g. sending emails, writing to files, etc.), you'll want to build an array that consists of messages that tell you whether each operation was successful. push is the most straightforward way to accumulate information into a single collection of data.
pop - is the opposite of push - it removes an element from the end of an array, and then returns that element. For example:
```
@mylist = (1,2,3);
$popped = pop(@mylist);
print "$popped\n";
```
pop hasn't been necessary in most of the CGI programming I've done, but that doesn't mean it won't be important in yours!
Note that you cannot use these array functions on hashes. You wouldn't want to anyway: since you don't know the order in which the associative array is stored, the results would be unpredictable.

A comparative note on Perl arrays: Perl is very impressive in the simplicity with which it handles arrays. As we've seen in our examples, you can create a list and assign it to an array with a single line of code, add an element to an array with another line of code, and then in just one more line of code print all the array elements:

@mylist = ("a","b","c");
push(@mylist, "mike");
print "contents of \@mylist: @mylist\n";

This is a much more cumbersome task in many other programming language. The following is the VBScript code necessary to do what we just did with three lines of Perl:

'declare a dynamic array, which is necessary
'since we plan to change its size
dim mylist()

'declare the size we want for now
redim mylist(2)

'assign the array values
mylist(0) = "a"
mylist(1) = "b"
mylist(2) = "c"

'to add a new element, its wise to confirm the
'current size of the array
length = ubound(mylist)

'adjust the array to handle one more element, but
'do not delete the current elements
redim preserve mylist(length + 1)

'add the new element
mylist(3) = "mike"

'declare a variable to use as a counter
dim i

'print the array elements
for i = 0 to ubound(mylist)
	response.write(mylist(i) & " ")
next

Lab: Exercises

Write a script that accepts a line of input from the user and assigns it to a variable. Using the substitution operator, look for every occurrence of the word "bold" - in lower or upper case - and replace it with the string "<B>bold</B>". Print the updated string.
Write a short script that accepts a line of input from the user, assigns it to a variable, and uses the translation operator to convert all the text to lower case. Print the updated string.
Write a script that has the following as the first line of code.
$list = "oranges 6 apples 11 bananas 7 cherries 5";
Your script should create a hash from this string, with each of the fruit names being a key in the hash, and the number following the fruit name being its corresponding value (hint: one approach to this would involve the use of split and while, there may be other approaches as well). Then print the key/value pairs of the hash. Do you know the reason why they didn't print in the same order you assigned them?

Return to CIS 24 home page