SJCC CIS 24: Class 1 (9/11) Lecture Notes

CIS 24: CGI and Perl Programming for the Web

Class 1 (9/11) Lecture Notes

Topics

Class Overview
Web Servers, Web Browsers, & Web Pages
What is CGI Programming?
What Are the Alternatives to CGI?
What is Perl, and Why Should I Use It For CGI Programming?
Lab: Installing Perl, and How to Use It

Return to CIS 24 home page

Class Overview

Enrollment: The standard maximum number of students in an SJCC class is 35. I will allow up to 40. I have to limit the number of students because of fire regulations, and because it's important that each student has access to a lab computer during the lectures and labs. Tonight we will randomly select students to fill spots not held by students who are already enrolled (we'll do this part-way through the class, so you can listen for a while and decide if this class is for you).
This course is intended for the beginner who wants to enter the world of programming interactive applications for the World Wide Web and quickly develop his or her skills. CIS 41 (Introduction to Computers) and CIS 42 (Program Design) are prerequisites for this course. The skills taught in those classes provide the essential foundations for keeping up with the early weeks of this class. Familiarity with HTML (Hypertext Markup Language) is helpful but not required.
About Your Instructor: My name is Michael Toppa. I'm a software engineer with Ask Jeeves. My primary responsibilities are for the Jeeves Advisor, which is a web-based application that we license to clients. You can see a couple examples of the Advisor in action at etown (click on the link for Ask Ida) and the Nike on-line store (click on the link for the Running Shoe Advisor). Although the Advisor is not a Perl/CGI application, I've used Perl extensively at my previous jobs - as the Webmaster for Phoenix Technologies and a Web Develop at E*TRADE.
About the Course: CGI - the "Common Gateway Interface" - is the standard method used by web servers to interact with other applications, such as databases. Using CGI, your web server acts as a "gateway" between your web site visitors and applications you want to make available to them. Because of its ease of use and versatility, Perl is one of the most popular languages for creating CGI applications. Perl is an excellent choice for tasks such as processing text input from web forms (such as user feedback forms) and serving as a "glue" between your web server and other applications (such as email servers and databases).
In this class you will not only learn how to create CGI applications yourself, but you will also gain an understanding of how web server software works, and the methods by which data is communicated across the World Wide Web.
While successful web programming requires some amount of technical skill, it is just as important to approach programming tasks with creative, problem solving ideas. This course is structured with an emphasis on providing the skills necessary to successfully take on the real-world programming challenges you will face after you leave the classroom.
Class Website: I maintain a web site for the class at http://www.toppa.com/cis24/. This site contains the course syllabus, each week's lecture notes, lab assignments, and weekly announcements. We'll be using this site every week.
Class Schedule: The first 5 classes cover the basics of Perl, with a particular focus on the aspects most relevant to CGI programming. During class #4 there will be a one hour Quiz.
Classes #6 through #9 cover the basics of how to create Perl/CGI programs that can process information that users submit from web forms. During class #9 there will be a two hour Midterm Exam.
Classes #10 through #14 focus on more advanced Perl/CGI topics: building web applications that involve multiple sequential user interactions, using "cookies", interacting with databases, sending email, and more. There will be a Final Exam at the designated time during final exam week.
Assignments, Tests, & Grading Lectures will be followed by Lab periods where you can work on Perl and CGI exercises that I'll assign. Although these assignments will not be graded, you will not be able to keep up with the class if you do not do them. A concrete incentive to do them is that one question on each of the tests will come directly from a previous lab assignment.
The tests will not emphasize rote learning or memorization - there will be no Scan-Tron tests, and you will not be asked to recite definitions of obscure terms. In fact, you will be allowed to use your notes and books during the tests.
The exams will consist of real-world programming challenges. You will need to draw upon the skills you have learned in the class lectures and in your weekly lab assignments to solve these problems during the time available (1 hour for the Quiz, and 2 hours for the Midterm and Final). In fact, one question in each test will come directly from a weekly lab assignment.
The Book - Teach Yourself Perl in 24 Hours: the book should be in stock now at the SJCC book store. You also should be able to find it at Barnes & Noble, Borders, etc. This book is required for the course - it's one of the best books I've come across for learning Perl.

Web Servers, Web Browsers, & Web Pages

What Is a Web Server, and Why Do I Need One? "Web server" is a term that is used loosely, and actually means one of two different things, depending on the context:
1. It can be used to refer to web server software, such as Apache, Microsoft's Internet Information Server (IIS), Netscape's web server software packages, etc. When I use the term in this class, this is usually what I mean. You need a web server because you cannot use or test your CGI programs unless you have web server software to run them! Later on in the semester we will use the Apache web server. I picked Apache because it's available for Windows and Unix, it's free, and it's popular (there are more Web sites using Apache than any other Web server).
2. "Web server" also can refer to a computer that is connected to the World Wide Web, and has web server software running on it. This is how most people use the term. So when somebody says, "How much RAM does your web server have?" he or she is referring to the computer that runs your web server software. It's important to note that you can have a web server, but not connect it to the World Wide Web. For example, you can install the Apache web server on your computer at home. Even though no one out on the Web can see it, you can use it to run and test CGI programs that you're developing.
How Do Web Servers and Web Browsers Communicate? When you type in a Web address in your browser, such as www.yahoo.com, you are initiating an HTTP request (HyperText Transfer Protocol) to the Yahoo web server. It's called HTTP because it's the protocol - the method agreed upon by makers of Web servers and Web browsers - used for requesting and sending HTML documents (HyperText Markup Language).
The job of Web servers is to respond to requests from Web browsers. Web servers respond to these requests by sending HTML documents. A Web browser receives the document, saves it temporarily on your computer's hard drive, deciphers it, and then displays it to you in the browser window.
The document sent to the Web browser might be a "static" document. That is, a document that is always the same for anyone who requests it. These pages are usually created by a person, and must be manually edited in order to change their content. Other documents are "dynamic" - their contents can vary depending on the circumstances. A simple example is a web page that displays the current time whenever it is requested. These pages are often created "on-the-fly" by a program that's called by your web server. Using CGI is the most common method for creating web pages containing dynamically generated content.

What is CGI Programming?

A Typical CGI Interaction
The best way to begin answering this question is to provide an example of what goes on when someone uses a CGI application. Here's an example of someone making a purchase on-line:
1. When the user is ready to buy the items he selected on the Web site, he goes to a "Checkout" Web page containing a form that asks for his name, address, etc. The page containing the form is usually a "static" page - it looks the same for everybody.
2. After the user fills out the form and clicks the "submit" button, the Web server directs the data to a CGI program. The program may examine the data to make sure the user provided all the information necessary to process the order.
3. The CGI program then sends the user's data to a database.
4. After the database receives the information, it creates an Order Number for the user, and sends it back to the CGI program.
5. The CGI program then creates a web page containing the user's Order Number and thanking him for the order.
6. The CGI program hands the web page off to the Web server, which then sends it to the user.
So, when you write a CGI program, you're often creating the "glue" between the Web server and other applications that you want to make available for use on the Web.
Note that simpler CGI programs do not communicate with external applications. These CGI programs receive a request from a Web server and generate a response completely on their own. After we learn the basics of Perl, we'll start our CGI programming with these simpler types.
CGI - The Common Gateway Interface
The first word represented in the acronym CGI is "Common" - which means it's supported by all web servers and browsers. The second word is "Gateway" - if you want an application such as a database to be available to your Web site visitors, CGI provides the gateway between the Web server and the database. The third word is "Interface" - which refers to specifics of how the gateway works.
Differences Between CGI Programming and Other Kinds of Programming
CGI programming is different from other kinds of programming in three important ways:
1. Security concerns: the HTML code used to create a Web form (which is the typical entry point to a CGI program) is available for anyone to read and try to find ways to break into your CGI program. Data from a form submission is transmitted across the Internet and could be intercepted by someone with enough motivation and skill. This is very different from, for example, a database application created in Microsoft Access, where all the data transfer occurs within the application itself.
2. Data passing methods: a CGI program is often used as the "glue" between a Web server and some other application. In general, CGI programming occurs in a environment of heterogeneous communication protocols. That is, what a CGI program has to do to communicate to a database may be extremely different from what is has to do to communicate with an email server. Of course, it also has to communicate with your Web server.
3. CGI is stateless: a CGI application is started up when a request is received from the Web server, and then it completely shuts down after it sends its response. It has no memory of previous events. For example, if you create a form where someone enters her first name, and then your CGI application receives that data and generates a form for her to enter her last name, it will not automatically remember the first name that was just provided. You need to include code that will save information between interactions when maintaining "state" is important.

What Are the Alternatives to CGI?

"Client-side" Programming Alternatives: CGI programming is "server-side" - the dynamic generation of the Web page content occurs on the server. With server-side programming, nothing in a user's Web browser can change without making a new call to the Web server. For example, if a user enters numbers into a form, and you want to add them up, you would send the numbers to the server, your CGI program would add up the numbers, and create a new Web page containing the total.
This is different from "client-side" programming - typically done with JavaScript - which is run within a user's Web browser. Client-side programming allows for changes within a user's browser without a new call to the Web server. The numbers entered into the form could be added, and a total displayed, without having to get a new page from the Web server.
Client-side programming is useful for creating flashy looking Web pages and doing relatively simple data manipulations, but it does not allow you to interact with any resources outside of the Web browser. For example, a client-side JavaScript program cannot directly communicate with a database on the Web Server.
"Server-side" Programming Alternatives: There are numerous alternatives to CGI. CGI is the oldest, most widely used method of server-side Web programming, and it is supported by every different type of Web server. In recent years many other methods have become available for accomplishing the same tasks performed by CGI programs: ASP, JSP, Cold Fusion, etc. These are typically proprietary packages. This means you have to buy them from a specific company, they may require unique Web server environments, and your support options are more limited. Their main advantages over CGI are that they often run faster (for reasons we'll get into later in the semester) and that they may integrate easily with certain other applications (for example, Microsoft has a reputation for making its applications work well with each other, but making them difficult to work with competitors' products).

What is Perl, and Why Should I Use It For CGI Programming?

Perl was created by Larry Wall, who has a background in linguistics, so you may find it a bit easier to understand than some other programming languages. It was created before the advent of the World Wide Web, primarily to serve as a tool for Unix administration tasks. The features that make it good for Unix administration also happen to make it a good choice for CGI programming. It's name comes from "Practical Extraction and Report Language," which is a concise description of what it's good at: extracting data from various sources (e.g. text files, CGI submissions, databases), manipulating it in desired ways (e.g. decoding the HTTP-encoded data that comes from a Web server), and generating a report (e.g. an HTML formatted page containing the desired information).

Some other things to know about Perl:

Perl is powerful and handy: there are hundreds of add-on "modules" that have been created for Perl. These modules are components designed to perform specific tasks, such as formatting data for output to HTML documents, communicating with databases, etc. These modules are freely available for your use, providing off-the-shelf solutions for a wide range of tasks. But when you need to create your own solutions, you'll find Perl is still very handy. It's the programming language equivalent of duct tape: it may not provide the most attractive, theoretically elegant solution to a problem, but it will provide solutions that work.
Perl is an excellent "glue" language. Perl is very adept at formatting data in various ways, parsing incoming data, and talking to ports and sockets on your server. This makes it a good choice for CGI programming - it's great at doing things like getting your Web server and databases talking to each other.
Perl is a procedural language. This makes it similar to languages like Pascal and C in many respects, but different from object-oriented languages like C++ and Java. However, the most recent version of Perl (Perl 5), does support some object-oriented programming techniques as well.
Perl is an interpreted language. This means that after you finish writing a script, you do not need to compile it. Each time you run a Perl script, it is sent to the Perl interpreter, which is actually responsible for executing the script. This means that Perl scripts typically run more slowly than compiled programs (such as C++), but it also means that the development and debugging process is usually faster and easier.
It's popular. Perl is the most common choice for doing CGI programming. You'll find free Perl scripts and modules all over the Web, and a large on-line Perl community that's available to help when you're really stuck on a programming problem.
It's free! You can download the latest versions of Perl from the Web for free - you'll also find it on the CD-ROM that comes with your book.
It's portable. Perl was originally developed on Unix, and is now also well established with DOS, Windows, and Macs. With some minor adjustments, you can take a script written on one platform and use it on another.
It's relatively easy to learn and use: many of the Perl programming rules provide more flexibility than other programming languages (e.g. loose data types), which means beginners make fewer mistakes. In particular, Perl is a "free-form" languages. You typically do not have to worry about things like the exact places in your scripts where you put line breaks (you can even put them in the middle of statement!). Also, the built-in Perl debugger provides helpful feedback on fixing syntax mistakes.

Lab: Installing Perl, and How to Use It

What You Need to Use Perl: You need three things: 1. Perl installed on your computer, 2. a command prompt (for Windows, go to the Start menu, then Programs and select MS-DOS Prompt or Command Prompt), and 3. a text editor (for Windows, go to the Start menu, then Programs, then Accessories and select Notepad). To do CGI programming, you also need a Web server -we'll get to that later in the semester.
Checking the Perl installation in the Lab: Perl should already be installed on the Lab computers. In your MS-DOS window, type:
```
perl -v
```
You should get a response containing information on the version of Perl installed on your computer. If you get anything else, let me know.
Installing Perl at Home: if you have a computer at home, you can install Perl on it too. Chapter 1 of your book has excellent instructions on how to do this for computers running Windows, Unix, or the Mac OS.
Your First Perl Script: In Notepad, type the following:
```
#!perl
print "Hello World!\n";
```
Then save the file to the "temp" folder on the C drive, with the name "hello.pl". In Windows, it's important to save your Perl scripts with the extension ".pl" - that way Windows will know that it should treat the file as a Perl script. In Unix, the file extension does not matter.
The first line of the script is called the "shebang" line. It's used to invoke (call) the Perl interpreter on your computer. We'll talk about this more in a later class. The second line is a simple Perl statement that prints the message "Hello World!" followed by a linefeed (that's what the "\n" does). The semicolon at the end indicates that this is the end of the statement.
Now go to your MS-DOS Window and move to the C:\temp directory. Then type:
```
hello.pl
```
You should get the message "Hello World!" as a response. If you don't, let me know. Note that the above only works on Windows NT. If you did this on Windows 95/98 or Unix, you'd type:
```
perl hello.pl
```

Return to CIS 24 home page