CIS 24: CGI and Perl Programming for the Web

Class 4 (10/2) Lecture Notes

Topics

  1. Review of Last Week's Lab Assignment
  2. Common Uses of Pattern Matching
  3. The Match Operator
  4. The $& System Variable
  5. Some Wildcards: the +, *, and ? Metacharacters
  6. Matching Alternatives: the [ and ] Metacharacters
  7. Escape Sequences for Metacharacters
  8. Pattern Anchors: the ^ and $ Metacharacters
  9. More Wildcards: Matching Any Letter or Number
  10. More Wildcards: Character-Range Escape Sequences
  11. Matching a Specified Number of Occurrences: the { and } Metacharacters
  12. Using Metacharacters with split
  13. Lab: Exercises

Return to CIS 24 home page


  1. Review of Last Week's Lab Assignment
    1. Write a script that:
      • Defines an array variable called @me which contains four elements: your first name, your last name, your age, and the name of the town you are from.
      • Uses a for loop which prints each element of the array.
      @me = ("Mike", "Toppa", 30, "Newport");
      
      for ($count = 0; $count <= $#me; $count++) {
      	print "$me[$count]\n";
      }

    2. Write a script that:
      • Assigns the value 0 to a variable called $sumAll
      • Assigns the value 0 to a variable called $sumEven
      • Uses a variable called $count to iterate a for loop 10 times.
      • In the for loop, adds the current value of $count to $sumAll
      • In the for loop, adds the current value of $count to $sumEven if $count is an even number.
      • After the for loop is completed, prints the values of $sumAll and $sumEven
      $sumAll = 0;
      $sumEven = 0;
      
      for ($count = 1; $count <= 10; $count++) {
      	$sumAll += $count;
      	$sumEven += $count if ($count % 2 == 0);
      }
      
      print "The value of \$sumAll is $sumAll\n";
      print "The value of \$sumEven is " . $sumEven;

    3. Write a script that uses the sort and reverse functions to put the elements of the array @me (from question 1) in reverse alphabetical order. Then use a foreach loop to print each element.
      @me = ("Mike", "Toppa", 29, "Newport");
      @me = reverse(sort(@me));
      
      foreach $item (@me) {
      	print "$item\n";
      }

    4. A common task in working with arrays is looking for duplicate values. For example, you may want to know which pages on your web site have been visited. A web server will write the name of a page to a log file every time that page is visited. You can then read the web server log file into an array, and use that array as a basis for creating a report of activity on your web site. To introduce you to this kind of work, write a script that:
      • Has the following statements as the first two lines of code:
        @pages = qw(index.html chapter1.html chapter4.html chapter8.html chapter1.html index.html chapter4.html chapter3.html);
        $previous = "";
      • Defines a new array called @pagesSorted that is an alphabetically sorted copy of @pages
      • Uses a foreach loop to see if there are any duplicate values in the array. To do this, assign the value of the current array element to the variable $previous as the last statement inside the loop. Then, as the first statement inside the loop, you can compare the current array element to the value of $previous.
      • Prints out each element of the array, excluding the duplicates. That is, if an element of the array has the same value as a previous element, don't print it (I leave it up to you to figure this part out!)
      @pages = qw(index.html chapter1.html chapter4.html chapter8.html chapter1.html index.html chapter4.html chapter3.html);
      $previous = "";
      
      @pagesSorted = sort(@pages);
      
      foreach $page (@pagesSorted) {
      	print "$page\n" unless ($page eq $previous);
      	$previous = $page;
      }

    5. Using the localtime function, write a script that indicates how many days there's been between December 25, 1999 and today. See p. 66 of your book, and Section V above, on how to use localtime. Helpful Hints: Last year was not a leap year, so there were 365 days, and December 25 was the 359th day. Also remember that the first day of the year according to Perl is day 0 (so to Perl, the last day of the year was day 364, and December 25 was day 358).
      ($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst) = localtime;
      
      $daysLastYear = 365 - 359;
      $daysThisYear = $yday + 1;
      $daysSince = $daysLastYear + $daysThisYear;
      print "There have been $daysSince days between December 25, 1999 and today.\n";

  2. Common Uses of Pattern Matching
  3. Pattern matching is one of the most powerful features of Perl. With Perl you can match just about any conceivable text pattern. With CGI programming, this is most commonly used for:

    1. Extracting and analyzing data from text files
      Examples: analyzing web server log files to generate reports on web site usage, building search engines

    2. Search and replace
      Examples: decoding HTML form submissions (we're learning pattern matching now so we can do this in a couple of weeks), automatically adding HTML formatting to text files (adding <P> tags where there are linefeeds), adding hyperlinks to text (if a file contains the string http://www.toppa.com, you can use Perl to have it replaced by <A HREF="http://www.toppa.com">http://www.toppa.com</A>)

    3. Data entry validation
      Example: if you have a form field that asks for someone's email address, you can check to see if what they typed in contains an "@" and one or more "." in the appropriate places. You can then ask them to fill out the field again if they enter an invalid email address.

  4. The Match Operator
  5. Your book introduces pattern matching in an unusual way, by presenting you with the match operator m//, but not showing you how to use it to find patterns in ordinary variables until the end of the chapter. I'm going to present the match operator in a different way, that you'll hopefully find a bit more intuitive than the book's presentation.
    1. The =~ operator tests whether a pattern exists in a string. (Your book uses m// as the match operator, but I think you'll find =~ easier to use.) For example:
      $string = "abcde";
      $result = $string =~ /abc/;
      print "$result\n";

      In this example, the number "1" is printed to the screen. This is because the variable $string contains the pattern "abc", so the =~ operator returns true. The result of the pattern matching test is assigned to $result, and since "true" is represented by "1" in Perl, the number "1" is printed to the screen.

    2. The !~ operator is the opposite of =~ since it tests to see if a pattern is not matched:
      $string = "abcde";
      $result = $string !~ /xyz/;
      print "$result\n";

      In this case the number "1" is again printed to the screen, because the pattern "xyz" was not found in the variable $string.

  6. The $& System Variable
  7. A system variable is a variable that has a special meaning in Perl. Perl can assign a value to a system variable, even if your script doesn't explicitly ask it do so. The $& system variable is a good example. It contains the pattern that was most recently matched. For example:
    $string = "abcde";
    $string =~ /abc/;
    print "$&\n";

    If the pattern was not matched (e.g. if we tried to match "mnop"), then the value of $& is not changed. $& is very handy when you need to re-use a matched pattern (e.g. if you want to print it).

  8. Some Wildcards: the +, *, and ? Metacharacters
  9. The + character has a special meaning when used in the context of pattern matching. It means "try to match one or more of the preceding character." For example:

    $string = "abbbcde";
    $string =~ /ab+/;
    print "$&";

    The * character is similar, but it attempts to match zero or more occurrences of the preceding character. For example:

    $string = "abcde";
    $string =~ /aq*b/;
    print "the matched pattern in \$string: $&\n";
    $string2 = "aqqqbcde";
    $string2 =~ /aq*b/;
    print "the matched pattern in \$string2: $&\n";

    + and * are greedy. That is, they will try to match as many characters as possible. For example, + won't stop after it's matched "ab". It will only stop when it's matched every consecutive "b" in the string.

    ? is similar to * but it matches only zero or one occurrence of the preceding character:

    $string = "abcde";
    $string =~ /aq?b/;
    print "the matched pattern in \$string: $&\n";
    
    $string2 = "aqqqbcde";
    
    if ($string2 =~ /aq?b/) {
    	print "the matched pattern in \$string2: $&\n";
    }
    
    else {
    	print "the pattern for \$string2 was not matched.\n";
    }

    The pattern match attempt on $string2 will return false, since there is more than one "q" between "a" and "b". Note the use of the if...else statement in this code. The pattern match on $string2 failed, but if we did not use an if conditional before trying to print $&, we would have simply printed the value of the last successful match, which was for $string, not $string2. Be careful of this when using $& - it contains the last successful match, which is not necessarily the same as the most recent attempt to match, since the most recent attempt may have failed, as it did here!

  10. Matching Alternatives: the [ and ] Metacharacters
  11. The [] characters enable you to define patterns that match one of a group of alternatives. For example:
    $string = "abcde";
    $string =~ /a[Bb]c/;
    print "the matched pattern in \$string: $&\n";
    $string2 = "aBcde";
    $string2 =~ /a[Bb]c/;
    print "the matched pattern in \$string2: $&\n";

    You can also use pattern matching characters in combination with each other:

    $string = "abBbbBcde";
    $string =~ /a[Bb]+c/;
    print "the matched pattern in \$string: $&\n";

  12. Escape Sequences for Metacharacters
  13. If you want to literally match a *, + or other special character in a string, you'll need to precede it with a backslash \ which is the escape character. For example:
    $string = "ab+cde";
    $string =~ /ab\+c/;
    print "the matched pattern in \$string: $&\n";

    This also applies to the other metacharacters that we'll discuss tonight and next week.

  14. Pattern Anchors: the ^ and $ Metacharacters
  15. The pattern anchors ^ and $ ensure that a pattern is only matched at the start or end of a string. For example, the following code will only match "abc" if it appears at the very beginning of the string:
    $string = "abcd";
    $string =~ /^abc/;
    print "the matched pattern in \$string: $&\n";

    $ is used to match a pattern at the end of a string. You can use them in combination to force matching of the entire string:

    $string = "abcd";
    $string =~ /^abc$/;
    print "the matched pattern in \$string: $&\n";

    In this case, the pattern match fails since "d" is the last character in the string, not "c".

  16. More Wildcards: Matching Any Letter or Number
  17. You can use ranges to match any letter or number. To match numbers:
    $string = "ab4cde";
    $string =~ /[0-9]/;
    print "the matched pattern in \$string: $&\n";

    To match lowercase letters:

    $string = "123a456";
    $string =~ /[a-z]/;
    print "the matched pattern in \$string: $&\n";

    You can use [A-Z] to match uppercase letters. Lastly, you can use these in combination with each other, and with the other special characters to, for example, match any consecutive combination of letters or numbers:

    $string = "Mike59:::";
    $string =~ /[0-9a-zA-z]*/;
    print "the matched pattern in \$string: $&\n";

  18. More Wildcards: Character-Range Escape Sequences
  19. Page 100 of Teach Yourself Perl lists the character range escape sequences. They are:

    \w matches any "word" character (i.e. an underscore, or any letter or digit)
    \W matches any character that is not a "word" character
    \d matches any digit character
    \D matches any character that is not a digit
    \s matches any "whitespace" character (i.e. a space, tab, newline, carriage return, or form feed)
    \S matches any non-whitespace character

    We can re-write the previous example more concisely now:

    $string = "Mike59:::";
    $string =~ /\w*/;
    print "the matched pattern in \$string: $&\n";

    Note that if we change the example to instead find any non-word characters, no match is printed:

    $string = "Mike59:::";
    $string =~ /\W*/;
    print "the matched pattern in \$string: $&\n";

    You might expect Perl to match the 3 colons at the end, since they are the first "non-word" characters in the string. Why doesn't it? Remember that the * metacharacter is looking for zero or more non-word characters. All pattern match tests read the string from left to right. So, the pattern match sees the first character "M" and performs the test: "Have I found zero or more non-word characters?" The answer is yes: zero non-word characters have been found. The test is satisfied, so the pattern matching ceases. You can match the ":::" pattern by changing the test to \W+ which looks for one or more non-word characters.

  20. Matching a Specified Number of Occurrences: the { and } Metacharacters
  21. The ? character will match zero or one occurrence of a character, and + and * will match as many consecutive occurrences as possible. If you want to match a specific number of occurrences, use the { } metacharacters. You can specify a minimum and a maximum number of characters to match. For example:
    $string = "abbbbcd";
    $string =~ /ab{1,3}/;
    print "the matched pattern in \$string: $&\n";

    To specify an exact number of occurrences, provide a single number:

    $string = "abbbbcd";
    $string =~ /ab{2}/;
    print "the matched pattern in \$string: $&\n";

    To specify a minimum, but not a maximum, leave off the upper bound:

    $string = "abbbbcd";
    $string =~ /ab{1,}/;
    print "the matched pattern in \$string: $&\n";

    Lastly, to specify a maximum but not a minimum, use a 0 as the lower bound:

    $string = "acd";
    $string =~ /ab{0,2}/;
    print "the matched pattern in \$string: $&\n";

  22. Using Metacharacters with split
  23. The use of the // characters should like familiar from our previous discussion of split. You can also use metacharacters with split. For example:
    $string = "Mike::Joe:Mary:::Fred";
    @names = split(/:+/, $string);
    print "@names\n";

  24. Lab: Exercises
    1. Write a script that:
      • Prints the message "Ask me a question politely"
      • Accepts input from the user and assigns it to the variable $question
      • If $question contains the word "please", print the message "Thank you for being polite", otherwise print the message "That was note very polite"

    2. Write a script that accepts a line of input from the user. Your script should then print the number of words that the user entered. You can do this by using the split function to split the line of input on the whitespace between the words. Your script must be able to handle multiple whitespace characters in a row (i.e. spaces or tabs).

    3. Write a script that has the user enter the name of a scalar or array variable. Your script should check the variable name to make sure it is valid, and then print a message accordingly. That is, your script should check to make sure the first character is a $ or @, that the second character is a letter or underscore, and that all of the subsequent characters are letters, underscores, or numbers. Remember that both capital and lower case letters are acceptable in variable names.

Return to CIS 24 home page