Professional Documents
Culture Documents
As you traverse the vast frontier of the World Wide Web, you will come across documents that make you wonder, "How did they do this?" These documents could consist of, among other things, forms that ask for feedback or registration information, imagemaps that allow you to click on various parts of the image, counters that display the number of users that accessed the document, and utilities that allow you to search databases for particular information. In most cases, you'll find that these effects were achieved using the Common Gateway Interface, commonly known as CGI. One of the Internet's worst-kept secrets is that CGI is astoundingly simple. That is, it's trivial in design, and anyone with an iota of programming experience can write rudimentary scripts that work. It's only when your needs are more demanding that you have to master the more complex workings of the Web. In a way, CGI is easy the same way cooking is easy: anyone can toast a muffin or poach an egg. It's only when you want a Hollandaise sauce that things start to get complicated. CGI is the part of the Web server that can communicate with other programs running on the server. With CGI, the Web server can call up a program, while passing userspecific data to the program (such as what host the user is connecting from, or input the user has supplied using HTML form syntax). The program then processes that data and the server passes the program's response back to the Web browser. CGI isn't magic; it's just programming with some special types of input and a few strict rules on program output. Everything in between is just programming. Of course, there are special techniques that are particular to CGI, and that's what this book is mostly about. But underlying it all is the simple model shown in Figure 1.1.
So how does the whole interface work? Most servers expect CGI programs and scripts to reside in a special directory, usually called cgi-bin, and/or to have a certain file extension. (These configuration parameters are discussed in the Configuring the Server section in this chapter.) When a user opens a URL associated with a CGI program, the client sends a request to the server asking for the file. For the most part, the request for a CGI program looks the same as it does for all Web documents. The difference is that when a server recognizes that the address being requested is a CGI program, the server does not return the file contents verbatim. Instead, the server tries to execute the program. Here is what a sample client request might look like:
GET /cgi-bin/welcome.pl HTTP/1.0 Accept: www/source Accept: text/html Accept: image/gif User-Agent: Lynx/2.4 libwww/2.14 From: shishir@bu.edu
This GET request identifies the file to retrieve as /cgi-bin/welcome.pl. Since the server is configured to recognize all files inf the cgi-bin directory tree as CGI programs, it understands that it should execute the program instead of relaying it directly to the browser. The string HTTP/1.0 identifies the communication protocol to use. The client request also passes the data formats it can accept (www/source, text/html, and image/gif), identifies itself as a Lynx client, and sends user information. All this information is made available to the CGI program, along with additional information from the server.
The way that CGI programs get their input depends on the server and on the native operating system. On a UNIX system, CGI programs get their input from standard input (STDIN) and from UNIX environment variables. These variables store such information as the input search string (in the case of a form), the format of the input, the length of the input (in bytes), the remote host and user passing the input, and other client information. They also store the server name, the communication protocol, and the name of the software running the server. Once the CGI program starts running, it can either create and output a new document, or provide the URL to an existing one. On UNIX, programs send their output to standard output (STDOUT) as a data stream. The data stream consists of two parts. The first part is either a full or partial HTTP header that (at minimum) describes what format the returned data is in (e.g., HTML, plain text, GIF, etc.). A blank line signifies the end of the header section. The second part is the body, which contains the data conforming to the format type reflected in the header. The body is not modified or interpreted by the server in any way. A CGI program can choose to send the newly created data directly to the client or to send it indirectly through the server. If the output consists of a complete HTTP header, the data is sent directly to the client without server modification. (It's actually a little more complicated than this, as we will discuss in Chapter 3, Output from the Common Gateway Interface.) Or, as is usually the case, the output is sent to the server as a data stream. The server is then responsible for adding the complete header information and using the HTTP protocol to transfer the data to the client.
CGI Programming 101: CGI Programming With Apache and Perl on Windows XP
This page will show you how to install the Apache web server and Perl on your home computer. You'll then be able to write CGI programs and test them locally on your computer. Once Apache is installed and running, you'll be able to view your pages by pointing your web browser at the http://localhost/ address. You don't even need to be connected to the internet to view local pages and CGI programs, which can be quite useful if you want to work on programming while you're traveling or otherwise offline. These instructions have been tested on Windows XP. You should be able to install Apache and Perl on earlier versions of Windows, but on those systems you should definitely read the installation instructions that come with the software, since some things may need to be configured differently.
Who can see your website? Programming Locally, then Uploading to the ISP
Differences Between CGI Programs on Unix and Windows Installing Apache on Windows XP Installing ActivePerl on Windows XP Configuring Apache Viewing Your Site Writing Your CGI Programs Other Perl Editors Troubleshooting
If you have a permanent, fixed IP address for your computer (e.g. your computer is in an office, or you have your own T1 line), your Apache server will be able to serve pages to anyone in the world*. If you have a transient IP address (e.g. you use a dialup modem, DSL modem or cable modem to connect to the internet), you can give people your temporary IP address and they can access your page using the IP address instead of a host name (e.g, http://209.189.198.102/)*. But when you logout, your server will obviously not be connected, and when you dial in again you'll probably have a different IP address. Obviously for permanent web hosting, you should either get a fixed IP address (and your own domain name), or sign up with an ISP that can host your pages for you (like cgi101.com). * Unless you're behind a firewall, and the firewall is not configured to allow web traffic through.
Programming Locally, then Uploading to the ISP
You may want to develop and debug your programs on your own computer, then upload the final working versions to your ISP for permanent hosting. Nearly all of the programs shown in CGI Programming 101 will work seamlessly on Unix or Windows, but see below for a few differences.
Differences between CGI Programs on Unix and Windows
1. The "shebang" line. The first line of a Perl program (often called the shebang line) typically looks like this:
#!/usr/bin/perl
The actual location of Perl may be different from system to system (e.g. /bin/perl, /usr/local/bin/perl, etc.) For ActivePerl in Windows, this line should be changed to:
#!/perl/bin/perl
If you're programming locally and uploading to a remote ISP, you'll have to change this line each time.... unless your ISP was thoughtful enough to add a symlink to Perl in /perl/bin/perl. (We've done that on cgi101.com.) 2. Permissions. On XP you don't need to worry about file permissions. A CGI program is always executable, and your programs can always write to files to your directory. (Although, this isn't necessarily a good thing...) On Unix, permissions matter. Your CGI programs will need to be set with execute permissions. Any files you want to write to will need to be set with write permissions. CGI Programming 101 includes instructions on how to properly adjust file permissions for CGI programs in Unix. If you are writing your programs on XP and are not planning to upload them to a Unix server, you can simply disregard the permissions information.
First go to http://httpd.apache.org/download.cgi and download Apache. Scroll down the page a bit until you find the one that says "best available version" (Apache 2.something). Then look for the "Win32 Binary (MSI Installer)". Download the binary .msi file to your computer (choose "open" rather than "save" so the installer will launch immediately).
Server Information - use localhost for both the Network Domain and the Server Name, unless you have a fixed IP address and your own domain name. Put your e-mail address for the Administrator's Email Address.
Finish the installation and quit the installer. At this point Apache is probably already running on your machine; go to http://localhost/ in your browser to view your start page.
To start/stop the Apache server, go to the Start menu and navigate to All Programs > Apache HTTP Server > Control Apache Server. There you can start, stop and restart Apache. You can also install the Apache taskbar icon via the "Monitor Apache Servers" option.
If you want to modify the homepage displayed by your Apache server, go to the Start menu and choose "My Computer", then navigate to Local Disk (C:) > Program Files > Apache Group > Apache 2. You'll see a folder containing items like this:
Open the htdocs folder and look for index.html. You can edit the file in Notepad or whatever HTML editor you like. For the programming examples in CGI Programming 101 we're going to create a separate folder in your "My Documents" area for CGI programs and HTML files. There's not really any need to modify the files here in htdocs unless you are setting up your own webserver and plan to host your own domain there.
Installing Perl should be just as easy as installing Apache. Go to http://www.activestate.com/Products/ActivePerl/ and click on the download link to begin. Download the latest version of Perl available (which is 5.8.1 as of November 2003). Download the MSI file and open it.
On the Custom Setup screen, you can leave the setup as the default. This will install Perl, PPM (the Perl Package Manager) and programming examples to your hard drive in the location C:\Perl.
The "new featuers in PPM" screen talks about a PPM profile feature, but that requires ASPN (the full, commercial version) Perl, which you probably aren't installing right now. Leave the "Enabled PPM3 to send profile info to ASPN" unchecked.
Under Choose Setup Options, both "Add Perl to the PATH environment variable" and "Create Perl file extension association" should be checked.
The installer will finish up by installing HTML documentation. This step will take a while so be patient. When it's finished, your browser will launch and bring up the ActivePerl documentation:
Bookmark this page now (in your browser's favorites menu) so you can access it easily later.
Now Perl is installed. All you need to do now is modify the Apache server configuration.
Configuring Apache
First go to the Start menu and go to "My Documents". Make a new folder there called "My Website". This is where you're going to store your web pages and CGI programs. Next you need to modify the Apache configuration file to tell it where your pages are, and enable CGI programs. Go back to the Start menu and navigate to All Programs > Apache HTTP Server > Configure Apache Server > Edit the Apache httpd.conf Configuration file. The config file will be opened for you in Notepad. Scroll down (or use Find) until you get to the UserDir section of the file. It should have a line like this:
UserDir "My Documents/My Website"
Apache 2.2 doesn't have a UserDir section If you're using Apache 2.2, you'll have to ADD the UserDir line and the Directory section ( see below ). See http://httpd.apache.org/docs/2.2/mod/mod_userdir.html for more info on this. Scroll down just past that and you'll come to a commented section for Directory:
#<Directory "C:/Documents and Settings/*/My Documents/My Website"> # AllowOverride FileInfo AuthConfig Limit # Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec # <Limit GET POST OPTIONS PROPFIND> # Order allow,deny # Allow from all # </Limit> # <LimitExcept GET POST OPTIONS PROPFIND> # Order deny,allow # Deny from all # </LimitExcept> #</Directory>
Uncomment this entire section (by removing the pound signs at the beginning of each line), and change the Options line to this:
Options MultiViews Indexes SymLinksIfOwnerMatch Includes ExecCGI
Options specifies what options are available in this directory. The important ones here are Indexes, which enables server-side includes, and ExecCGI, which enables CGI programs in this directory. Scroll down a bit further to the DirectoryIndex line, and add index.cgi to the end of that line:
DirectoryIndex index.html index.html.var index.cgi
Now scroll down several pages (or use Find) to the AddHandler section. Uncomment the CGI line:
AddHandler cgi-script .cgi
This causes any file with a .cgi extension to be processed as a CGI program. If you want to also have files with a .pl extension be processed as CGI programs, add the .pl extension on that same line:
AddHandler cgi-script .cgi .pl
This causes all .html files to be searched for server-side include tags. Now save the configuration file, and restart Apache. Check http://localhost/ in your browser to ensure that the server restarted successfully. Trouble? If you get an error like the following:
Only one usage of each socket address (protocol/network address/port) is normally permitted. : make_sock: could not bind to address 0.0.0.0:80 no listening sockets available, shutting down
This probably means you're already running another web server (such as IIS) on your machine. You'll need to remove IIS in order to run Apache. See the following Microsoft document on How to Remove IIS.
Viewing Your Site
http://localhost/ is the homepage for your site; it shows the index.html page located in the htdocs folder.
To view the pages in your "My Website" folder, the actual URL is http://localhost/~my username/. For example, on my computer, my username is "Jackie Hamilton", so the URL to my pages is http://localhost/~Jackie Hamilton/. (If you don't know your username, open the Start menu; your username is at the top of the Start box.) In your browser, go ahead and type in the URL to your web page. If you remembered to create the "My Website" folder earlier, you should now see an empty directory listing. Bookmark the page so you don't have to type in the long URL any more.
Now you're ready to write some CGI programs! Here's a simple one you can use to get started. You can write this in Notepad:
#!/perl/bin/perl -wT print "Content-type: text/html\n\n"; print "<h2>Hello, World!</h2>\n";
Unfortunately Notepad has a nasty habit of appending .txt to the end of all text files, so when you go to save this file, change the "Save as Type" from "Text Documents" to "All Files". Then put "first.cgi" as the file name. Save it in your My Website folder, then reload your web page in your browser. You should see first.cgi listed there; click on it to view your first CGI program! Now go to Chapter 1 to start learning CGI programming.
Other Perl Editors
You can get by just fine by writing all of your CGI programs in Notepad. But you might find it more helpful to use a proper Perl editor for writing code. ActiveState (the generous folks who provide ActivePerl for free) also sells Visual Perl, a Perl plug-in for Visual Studio .NET. EditPlus is a shareware ($30) text/HTML/programming editor with syntax highlighting for various languages. The DzSoft Perl Editor offers syntax coloring (and checking), a builtin "run" option that you can use to test your scripts (and view error messages), a file template for new files, quick-insert shortcuts, and other useful tools. This program is shareware ($49) but a demo is available for download and evaluation.
OptiPerl is a visual developing environment and editor for creating, testing, debugging and running perl scripts. A free trial download is available, and if you decide to keep it, the standard license is $39. Perl Editor by EngInSite is an integrated development environment for creating, testing and debugging Perl scripts.
Troubleshooting
If you get an error like this when you try to start Apache:
Only one usage of each socket address <protocol/network address/port> is normally petmitted. :make_sock could not bound to address:0.0.0.0:80 no listening sockets available , shutting down Unable to open logs
This probably means you already have another web server program (like IIS) running. You'll need to turn the other one off before you can start Apache. To disable this, go to the Control Panel->Administrative Tools->Services, and look for the IIS service. Right-click to stop the service.
Why CGI programming? A basic example Analysis of the example So what is CGI programming? Using a C program as a CGI script The Hello world test How to process a simple form Using METHOD="POST" Further reading
This is an introduction to writing CGI programs in the C language. The reader is assumed to know the basics of C as well how to write simple forms in HTML and to be able to install CGI scripts on a Web server. The principles are illustrated with very simple examples. Two important warnings:
To avoid wasting your time, please checkfrom applicable local doc u ments or by contacting local webmasterwhether you can install and run CGI scripts written in C on the server. At the same time, please check how to do that in detailspecifically, where you need to put your CGI scripts. This document was written to illustrate the idea of CGI scripting to C program mers. In practice, CGI programs are usually written in other lan guages, such as Perl, and for good reasons: except for very simple cases, CGI programming in C is clumsy and error-prone.
A basic example
The above-mentioned How the web works: HTTP and CGI explained is a great tutorial. The following introduction of mine is just another attempt to present the basics; please consult other sources if you get confused or need more information. Let us consider the following simple HTML form: <form action="http://www.cs.tut.fi/cgi-bin/run/~jkorpela/mult.cgi"> <div><label>Multiplicand 1: <input name="m" size="5"></label></div> <div><label>Multiplicand 2: <input name="n" size="5"></label></div> <div><input type="submit" value="Multiply!"></div> </form> It will look like the following on your current browser:
Multiplicand 1:
Multiplicand 2:
Multiply!
You can try it if you like. Just in case the server used isnt running and accessible when you try it, heres what you would get as the result:
Multiplication results
This was constructed from that part of the ACTION value that follows the host name, by appending a question mark ? and the form data in a specifically encoded format. The server to which the request was sent (in this case, www.cs.tut.fi) will then process it according to its own rules. Typically, the servers configuration defines how the relative URLs are mapped to file names and which directories/folders are interpreted as containing CGI scripts. As you may guess, the part cgi-bin/ in the URL causes such interpretation in this case. This means that instead of just picking up and sending back (to the browser that sent the request) an HTML document or some other file, the server invokes a script or a program specified in the URL (mult.cgi in this case) and passes some data to it (the datam=4&n=9 in this case).
It depends on the server how this really happens. In this particular case, the server actually runs the (executable) program in the file mult.cgi in the subdirectory cgi-bin of user jkorpelas home directory. It could be something quite different, depending on server configuration.
Invocation means different things in different cases. For a Perl script, the server would invoke a Perl interpreter and make it execute the script in an interpretive manner. For an executable program, which has typically been produced by a compiler and a loader from a source program in a language like C, it would just be started as a separate process. Although the word script typically suggests that the code is interpreted, the term CGI scriptrefers both to such scripts and to executable programs. See the answer to question Is it a script or a program? in CGI Programming FAQ by Nick Kew.
You need to compile and load your C program on the server (or, in principle, on a system with the same architecture, so that binaries produced for it are executable on the server too).
Normally, you would proceed as follows:
1. Compile and test the C program in normal interactive use. 2. Make any changes that might be needed for use as a CGI script. The program should read its input according to the intended form sub mis sion method. Using the default GETmethod, the input is to be read from the environment variable. QUERY_STRING. (The program may also read data from filesbut these must then reside on the server.) It should generate output on the standard output stream (stdout) so that it starts with suitable HTTP headers. Often, the output is in HTML format. 3. Compile and test again. In this testing phase, you might set the environment variableQUERY_STRING so that it contains the test data as it will be sent as form data. E.g., if you intend to use a form where a field named foo contains the input data, you can give the command setenv QUERY_STRING "foo=42" (when using the tcsh shell) or QUERY_STRING="foo=42" (when using the bash shell). 4. Check that the compiled version is in a format that works on the server. This may require a recompilation. You may need to log on into the server computer (using Telnet, SSH, or some other terminal emulator) so that you can use a compiler there.
5. Upload the compiled and loaded program, i.e. the executable binary program (and any data files needed) on the server. 6. Set up a simple HTML document that contains a form for testing the script, etc.
You need to put the executable into a suitable directory and name it according to serverspecific conventions. Even the compilation commands needed here might differ from what you are used to on your workstation. For example, if the server runs some flavor of Unix and has the Gnu C compiler available, you would typically use a compilation command likegcc -o mult.cgi mult.c and then move (mv) mult.cgi to a directory with a name likecgi-bin. Instead of gcc, you might need to use cc. You really need to check local instructions for such issues. The filename extension .cgi has no fixed meaning in general. However, there can beserver-dependent (and operating system dependent) rules for naming executable files.Typical extensions for executables are .cgi and .exe.
It depends on the scripting or programming language used how a program can access the value of an environment variable. In the C language, you would use the library functiongetenv (defined in the standard library stdlib) to access the value as a string. You might then use various techniques to pick up data from the string, convert parts of it to numeric values, etc. The output from the script or program to primary output stream (such as stdin in the C language) is handled in a special way. Effectively, it is directed so that it gets sent back to the browser. Thus, by writing a C program that it writes an HTML document onto its standard output, you will make that document appear on users screen as a response to the form submission. In this case, the source program in C is the following: #include <stdio.h> #include <stdlib.h> int main(void) { char *data; long m,n; printf("%s%c%c\n", "Content-Type:text/html;charset=iso-8859-1",13,10); printf("<TITLE>Multiplication results</TITLE>\n"); printf("<H3>Multiplication results</H3>\n"); data = getenv("QUERY_STRING"); if(data == NULL) printf("<P>Error! Error in passing data from form to script."); else if(sscanf(data,"m=%ld&n=%ld",&m,&n)!=2) printf("<P>Error! Invalid data. Data must be numeric."); else printf("<P>The product of %ld and %ld is %ld.",m,n,m*n); return 0; }
As a disciplined programmer, you have probably noticed that the program makes no check against integer overflow, so it will return bogus results for very large operands. In real life, such checks would be needed, but such considerations would take us too far from our topic.
Note: The first printf function call prints out data that will be sent by the server as an HTTP header. This is required for several reasons, including the fact that a CGI script can send any data (such as an image or a plain text file) to the browser, not just HTML documents. For HTML documents, you can just use the printf function call above as such; however, if your character encoding is different from ISO 8859-1 (ISO Latin 1), which is the most common on the Web, you need to replace iso-8859-1 by the registered name of the encoding (charset) you use.
I have compiled this program and saved the executable program under the name mult.cgiin my directory for CGI scripts at www.cs.tut.fi. This implies that any form with action="http://www.cs.tut.fi/cgi-bin/run/~jkorpela/mult.cgi" will, when submitted, be processed by that program.
Consequently, anyone could write a form of his own with the same ACTIONattribute and pass whatever data he likes to my program. Therefore, the program needs to be able to handle any data. Generally, you need to check the data before starting to process it.
Using METHOD="POST"
The idea of METHOD="POST"
Let us consider next a different processing for form data. Assume that we wish to write a form that takes a line of text as input so that the form data is sent to a CGI script that appends the data to a text file on the server. (That text file could be readable by the author of the form and the script only, or it could be made readable to the world through another script.) It might seem that the problem is similar to the example considered above; one would just need a different form and a different script (program). In fact, there is a difference. The example above can be regarded as a pure query that does not change the state of the world. In particular, it is idempotent, i.e. the same form data could be submitted as many times as you like without causing any problems (except minor waste of resources). However, our current task needs to cause such changesa change in the content of a file that is intended to be more or less permanent. Therefore, one should use METHOD="POST". This is explained in more detail in the document Methods GET and POST in HTML forms - whats the difference? Here we will take it for granted that METHOD="POST" needs to be used and we will consider the technical implications.
For forms that use METHOD="POST", CGI specifications say that the data is passed to the script or program in the standard input stream (stdin), and the length (in bytes, i.e. characters) of the data is passed in an environment variable calledCONTENT_LENGTH.
Reading input
Reading from standard input sounds probably simpler than reading from an environment variable, but there are complications. The server is not required to pass the data so that when the CGI script tries to read more data than there is, it would get an end of file indi cation! That is, if you read e.g. using the getchar function in a C program, it is undefined what happens after reading all the data characters; it is not guaranteed that the function will return EOF.
When reading the input, the program must not try to read more thanCONTENT_LENGTH characters.
Sample program: accept and append data
A relatively simple C program for accepting input via CGI and METHOD="POST" is the following: #include <stdio.h> #include <stdlib.h> #define MAXLEN 80 #define EXTRA 5 /* 4 for field name "data", 1 for "=" */ #define MAXINPUT MAXLEN+EXTRA+2 /* 1 for added line break, 1 for trailing NUL */ #define DATAFILE "../data/data.txt" void unencode(char *src, char *last, char *dest) { for(; src != last; src++, dest++) if(*src == '+') *dest = ' '; else if(*src == '%') { int code; if(sscanf(src+1, "%2x", &code) != 1) code = '?'; *dest = code; src +=2; } else *dest = *src; *dest = '\n'; *++dest = '\0'; } int main(void) { char *lenstr; char input[MAXINPUT], data[MAXINPUT]; long len; printf("%s%c%c\n", "Content-Type:text/html;charset=iso-8859-1",13,10); printf("<TITLE>Response</TITLE>\n"); lenstr = getenv("CONTENT_LENGTH"); if(lenstr == NULL || sscanf(lenstr,"%ld",&len)!=1 || len > MAXLEN) printf("<P>Error in invocation - wrong FORM probably."); else { FILE *f;
fgets(input, len+1, stdin); unencode(input+EXTRA, input+len, data); f = fopen(DATAFILE, "a"); if(f == NULL) printf("<P>Sorry, cannot store your data."); else fputs(data, f); fclose(f); printf("<P>Thank you! Your contribution has been stored."); } return 0; } Essentially, the program retrieves the information about the number of characters in the input from value of the CONTENT_LENGTH environment variable. Then it unencodes (decodes) the data, since the data arrives in the specifically encoded format that was already men tioned. The program has been written for a form where the text input field has the name data (actually, just the length of the name matters here). For example, if the user types
Hello there!
(with space encoded as + and exclamation mark encoded as %21). The unencode routine in the program converts this back to the original format. After that, the data is appended to a file (with a fixed file name), as well as echoed back to the user. Having compiled the program I have saved it as collect.cgi into the directory for CGI scripts. Now a form like the following can be used for data submissions: <FORM ACTION="http://www.cs.tut.fi/cgi-bin/run/~jkorpela/collect.cgi" METHOD="POST"> <DIV>Your input (80 chars max.):<BR> <INPUT NAME="data" SIZE="60" MAXLENGTH="80"><BR> <INPUT TYPE="SUBMIT" VALUE="Send"></DIV> </FORM>
Sample program: view data stored on a file
Finally, we can write a simple program for viewing the data; it only needs to copy the content of a given text file onto standard output: #include <stdio.h> #include <stdlib.h> #define DATAFILE "../data/data.txt" int main(void) { FILE *f = fopen(DATAFILE,"r"); int ch;
if(f == NULL) { printf("%s%c%c\n", "Content-Type:text/html;charset=iso-8859-1",13,10); printf("<TITLE>Failure</TITLE>\n"); printf("<P><EM>Unable to open data file, sorry!</EM>"); } else { printf("%s%c%c\n", "Content-Type:text/plain;charset=iso-8859-1",13,10); while((ch=getc(f)) != EOF) putchar(ch); fclose(f); } return 0; } Notice that this program prints (when successful) the data as plain text, preceded by a header that says this, i.e. has text/plain instead of text/html. A form that invokes that program can be very simple, since no input data is needed: <form action="http://www.cs.tut.fi/cgibin/run/~jkorpela/viewdata.cgi"> <div><input type="submit" value="View"></div> </form> Finally, heres what the two forms look like. You can now test them:
Form for submitting data
Please notice that anything you submit here will become visible to the world:
Your input (80 chars max.):
Send
The content of the text file to which the submissions are stored will be displayed as plain text.
View
Even though the output is declared to be plain text, Internet Explorer may interpret it partly as containing HTML markup. Thus, if someone enters data that contains such markup, strange things would happen. The viewdata.c program takes this into account by writing the NUL character ('\0') after each occurrence of the greater-than character lt;, so that it will not be taken (even by IE) as starting a tag.
Further reading
You may now wish to read The CGI specification, which tells you all the basic details about CGI. The next step is probably to see what the CGI Programming FAQ contains. Beware that it is relatively old. There is a lot of material, including introductions and tutorials, in the CGI Resource Index. Notice in particular the section Programs and Scripts: C and C++: Libraries and Classes, which contains libraries that can make it easier to process form data. It can be instructive to parse simple data format by using code of your own, as was done in the simple examples above, but in practical application a library routine might be better. The C language was originally designed for an environment where only ASCII characters were used. Nowadays, it can be usedwith cautionfor processing 8-bit characters. There are various ways to overcome the limitation that in C implementations, a character is generally an 8-bit quantity. See especially the last section in my book Unicode Explained. Basic CGI Programming
Written by Valerie Mates, May 18, 1999
CGI programs generate web pages on the fly. When you type text in boxes on a web page and press a button to submit the data, you are running a CGI program. This page describes how to write a CGI program.
The Basics At its most basic, a CGI program is one that reads an environment variable and writes out ordinary HTML. For example, here is a simple shell script CGI program:
#!/bin/sh echo "Content-type: text/html" echo "" echo "Hello world!"
Ideally, the HTML from that script would include tags like <html> and <head> and <body>, but browsers will know what to do with it even if those are missing.
What Programming Language? A CGI program can be written in any programming language. This page talks about CGI programming in Perl.
Where do I put the program? CGI programs need to go in a "cgi-bin" directory. This is a special directory of programs that can be run by the web server. Unfortunately there is no standard location for a cgi-bin directory. To find yours, check your web server configuration or ask your system administrator. Forms You can send data to a CGI program from a form, either on a web page or from another CGI program. The HTML for a simple form might look like this:
<form action="/cgi-bin/foo.cgi" method=post> Greeting: <input type="text" name="greeting" size=10 maxlength=20><br> Your Name: <input type="text" name="your_name" size=20 maxlength=30><br> <input type="submit" name="submit" value="Send"> </form>
To use cgi-lib in a Perl program, put it in the same directory as your program (or in your Perl search path) and include the line:
require("cgi-lib.pl");
then all the variables on the form are put into a hash named %in. In your program you can refer to the variables like this $in{'greeting'} and $in{'your_name'} that is $in{'name_of_variable'}.
HTML Headers The first thing a CGI program must do, before displaying any text, is to tell the browser that the program will be sending text. One way to do this is to print the string: Content-type: text/html followed by two newlines. The other option is to use a cgi-lib function called PrintHeader, which you do by including this line in your program:
print &PrintHeader;
If you leave out the HTML headers, you will get a web server error.
Using the Variables Here is a sample Perl program that uses the variables from the form:
#!/usr/local/bin/perl require("cgi-lib.pl"); &ReadParse; print &PrintHeader; print <<EOF; <html> <head> <title>A Greeting From $in{'your_name'}</title> </head> <body bgcolor="#FFFFFF"> $in{'your_name'} sends you this greeting:<br> <blockquote>$in{'greeting'}</blockquote> </body> </html> EOF
When someone runs that program, its output will look something like this:
Jane Smith sends you this greeting: Hello, isn't this weather great?
Handling Errors Normally in a Perl program, if an error condition occurs, you would use the Perl commanddie to display an error message and exit. However, you cannot do this in a CGI program. If you do this in a CGI program, the error message
will be hidden away in a web server error log where the user cannot see it. The user will see only an error message that says something like "500 Server error". Instead of die, a CGI program should display an intelligent error message and then callexit. I wrote a routine called "crash" that I use. Here is the code for it:
# # Subroutine to exit gracefully from errors: # sub crash{ print $_[0]; print "</td></tr></table></td></tr></table> </body></html>"; exit; }
The end-of-table code in the crash routine is useful because if you print a table without a </table> tag, the browser won't show anything in the table. The extra tags make sure that even if the crash occurs while you are in the middle of writing out a table, the error message will still be readable. Here is an example of a program that calls crash:
#!/usr/local/bin/perl require("cgi-lib.pl"); &ReadParse; print &PrintHeader; # If greeting is blank, display error message and exit: if ($in{'greeting'} eq "") { crash("Please enter a greeting. Press your browser's Back button to enter it."); } print <<EOF; <html> <head> <title>A Greeting From $in{'your_name'}</title> </head> <body bgcolor="#FFFFFF"> $in{'your_name'} sends you this greeting:<br> <blockquote>$in{'greeting'}</blockquote> </body> </html> EOF # # Subroutine to exit gracefully from errors: #
Debugging Tips Some techniques that are useful for debugging CGI programs;
1.
Use print statements. That is, if you want to know what the variable $in{'foo'} is set to, add a line that says:
print "$in{'foo'}<br>\n";
2. You can run CGI programs from the command line! That is, if the program foo.cgikeeps giving you errors when you run it from the web server, telnet to the server, change directory to your cgi-bin directory, and, at the command line, run your program type ./foo.cgi. Perl is good about giving meaningful error messages. 3. Use the "tail" command to look at your web server error log. When your browser is giving you meaningless "500 error" messages, your web server's error log is likely to have a much more useful error message. 4. The error message Premature end of script headers may mean that your program wrote out some other text before the HTML headers.
Security Question: What is wrong with this line of code?
system("log_to_database $in{'user_data'}");
Answer: This program runs a Unix command with user-supplied data. That is, it runs the command:
log_to_database something
where something could be anything at all. Suppose the user had entered this text: ; rm /. Then the Unix command that would be run is log_to_database ; rm /. That is, by adding a semicolon, the user terminated the log_to_database command and started a second command on the same command line. That second command in this case is (a mild version of) the command to delete all the files on the system. Since you don't want users running random commands on your system, be very careful what you do with user-supplied data. Avoid passing user-supplied data to system commands. If you must do so, first filter out all possible bad
characters from the data, or, better yet, to avoid missing any special characters you haven't thought of, filter out all characters exceptthe ones that are acceptable. For example, the command:
$in{'user_data'} =~ s/[^A-Z0-9]//gio;
will remove all non-alphanumeric characters from the variable $in{'user_data'}. If the user enters ; rm / and you run that substitution, the user's entry will be pared down to only its alphanumeric characters, which in this case are the letters "rm" without the dangerous semicolon. Now you can safely run the command as log_to_database rm, which will merely log the letters "rm" to the database which is vastly preferable to deleting all the files on your system! Be careful too about filenames. If the user enters a filename, beware allowing a carefully placed .. or other special characters to overwrite a file in some other directory from the one where you intended the data to be stored.
General The Common Gateway Interface (CGI) is a standard for interfacing external applications with Web servers. Unlike a plain HTML document which returns only static information, a CGI program, on the other hand, is executed in real-time, so that it can output dynamic information. 1. Instructions to setup CGI programs 2. Setup authenticated CGI programs
Instructions to setup CGI programs 1. Before you can write your own CGI program, you will need to have an account on the Teaching Web Server. If you haven't got an account on the Teaching Web Server yet, please refer to our web page on "Who can apply" for more information. 2. You have to place your CGI programs in a directory called cgi-bin in your public_html directory. Use the following command to create the "cgi-bin" directory first:
mkdir $HOME/public_html/cgi-bin
3. From now on, you can place your CGI programs under the directory $HOME/public_html/cgibin and the URL to access your CGI program is :
http://teaching.ust.hk/cgibin/cgiwrap/~course_code/CGI_program_name
4. Examples
o
http://teaching.ust.hk/cgi-bin/cgiwrap/~comp123/testprog.cgi
or
http://teaching.ust.hk/cgi-bin/cgiwrap/comp123/testprog.cgi o
To query the imagemap "mymap.map" in the public_html directory of the course "comp123":
http://teaching.ust.hk/cgi-bin/imagemap/~comp123/mymap.map
or
http://teaching.ust.hk/~comp123/mymap.map (for image map only and file extension must be .map)
5. Note to C programmers If you are writing your CGI programs in C, it may happen that when running your program, the dynamic linker would warn you that the library you are using is older than expected. As this warning will usually be given out as first line of output, this makes the CGI program does not work as expected. The solution to this is to recompile your C program in Solaris OS before you run it on Web as our Web Server is running Solaris 2.6 now.
Setup authenticated CGI programs 1. First read Instructions to setup CGI programs in order to understand basic CGI program setup procedure. 2. Because cgiwrap does not support .htaccess placed underneath in your $HOME/public_html/cgi-bin, the web server does not read and follow any commands in .htaccess right there. 3. To execute your CGI program with authentication, you should use the following URL instead:
http://teaching.ust.hk/cgibin/auth/cgiwrap/course_code/CGI_program_name
or
http://teaching.ust.hk/cgibin/auth/cgiwrap/~course_code/CGI_program_name
4. We have put a generic .htaccess (see below) in URI /cgi-bin/auth/cgiwrap , the web server will request authentication when accessing the above URL. After entering correct ITSC account and password, the CGI environment variables e.g. REMOTE_USER and REMOTE_HOST will pass along to your CGI program.
AuthName HKUST AuthType Basic <Limit GET POST> require valid-user </Limit>
5. It is your task to check who are authorized to execute the CGI program based on the variable REMOTE_USER. 6. This setup differs from general web authentication as because we use cgiwarp for user CGI programs execution. In traditional web authentication, both authentication (who you are) and authorization (are you right to do that?) are handled or defined by web server and .htaccess file. However, authentication is still handled by web server but authorization will be your responsibility in CGI program. 7. Under this setup, please note that you are not possible to define your own username and password pair for authentication.
What is CGI?
There seems to be a lot of confusion even among experienced programmers about it. Myths abound. I am sure you have heard at least some of them. Which is why I would like to tell you first what CGI is
not.
It is not a programming language. That means, for example: o You do not have to learn Perl o You can use the languages you already know o You can use any language as long as it can read input can write output And what computer language cant? For that matter, you do not need to use a language. It is not a programming style. You can use your own. It is not cryptic. Perl is cryptic, all right, but see above: You dont need to use Perl. It is not for Unix gurus only. In fact, you dont have to be any kind of guru. All you need is to know how to program. And you already know that!
o
NOTE: Please dont misconstrue me. I have nothing against Perl. But from browsing the web you may get the impression you must learn it. All Im saying is that you dont have to. But if you want to, be my guest. ANOTHER NOTE: If you dont know anything about programming, you need to learn that first. But you can still continue reading.
It sends a line of plain text which explains what kind of file is being sent, i.e. HTML, or GIF, or whatever else. It sends out a blank line. It sends out the contents of the file.
In that order.
the server gets the data from. Nevertheless, a typical server is programmed to get its data from a file. It simply reads the data from the file and sends it to the client during the last of the three steps I talked about before. As a result of this process, the server only sends static data. That is to say, the server does not dynamically modify the data.
Now, let us say you would like to send the listing of your current directory to the web (not a good idea, but it shows just how simple it is). Well, MS DOS has the dir command that sends the directory listing to standard output.
The first line of this batch file tells the browser to expect plain text. The second sends a blank line. The third lists the contents of the current directory. A disclaimer is in place here: Since my web site is on a Unix server, I could not test this. I know you can use Unix shells for similar purposes. I do not know whether Windows servers let you use batch files for CGI. But since more people understand MS DOS batch files than Unix shells, I chose this example.
You can send more data (URL has a size limit). The data is not logged along the way. Sending a password, for example, as part of the URL leaves a trail in the various systems your data is travelling through! Data does not appear in the browser Location bar. Again, showing a password there may not be appreciated by the user if someone is watching over his shoulder.
it. For HTML you do it by sending the string Contenttype: text/html followed by two line feeds before doing anything else. So, in C, you could code something like printf("Content-type: text/html\n\n");
Please note the existence of a hidden input with no assigned value, just to test what it sends to the program. You can play with this form and see what it sends to my program below. A thing to note is that it converts any spaces into plusses, and any other non-alphanumeric values into %xx, where xx is the hexadecimal version of its ASCII value. Fortunately, that is fairly simple to fix,
and c will also show you the fixed input. After that, there is one thing left: You need to parse the input. By that I mean, you need to break it appart into pairs of key and value. Each pair is separated by an ampersand (&). Please note I said separated, not terminated. There is no ampersand after the last pair. Within each pair, the key part is on the left of an equal sign (=), while the value is on the right. Pretty much like an assignment in C and many other programming languages. To illustrate this, c parses the data and shows you the pairs. You will note that the favorite color key seems to have no value. But it does. It just happens to be spaces. Instruct your browser to show you the page source and take a look at the HTML code c produces to see what I am talking about. Here is the form, play with it as much as you want. Just click BACK after viewing the results to return here.
I'm learning
Give me a break!
Reset
I'm learning
Give me a break!
Reset
Feel free to modify the form in any way you want (just copy its source code above), and try it again. But do me a favor: Do it from your own computer. Please do not place it on a web page unless you write your own program to test it with. Please understand that I have a bandwidth limit with my host and it might cost me extra money if you let the whole world use my CGI program from my server. By the way, if you clicked RELOAD while you were in the c program, your browser probably reacted differently when the data was sent by POST from when it was sent by GET. If you did not do that, go back and try it!
Content-type headers Here-document quoting File locations/extensions for running CGI scripts Testing from the command line Testing from the Web server CGI script file permissions
Content-type headers
Now let's modify hello.pl so it will run as a CGI script. Every CGI script needs to output a special header as the first thing the script outputs. This header line is checked by the Web server, then passed on to the remote user invoking the script in order to tell that user's browser what type of file to expect. Most of the time, your script is going to output an HTML file, which means you'll need to output the following header:
print "Content-type: text/html\n\n";
You need to output it exactly like that, including the capital "C" and the lowercase everything else. Please note that there are two newline characters (\n\n) at the end of the header. CGI novices tend to forget that, but it's really important, since the header needs to be followed by a blank line.
So, adding that line to our hello.pl script gives us the following:
#!/usr/local/bin/perl # hello.pl -- my first perl script! print "Content-type: text/html\n\n"; print "Hello, world!\n";
Here-document quoting
As long as we're claiming this is HTML that we're outputting, let's go ahead and make our output a valid HTML file:
#!/usr/local/bin/perl # hello.pl -- my first perl script! print "Content-type: text/html\n\n"; print <<"EOF"; <HTML> <HEAD> <TITLE>Hello, world!</TITLE> </HEAD> <BODY> <H1>Hello, world!</H1> </BODY> </HTML> EOF
Take a careful look at the stuff that replaced the "" characters used to quote the original "Hello, world!\n" line. That <<"EOF"; thing, and the EOF all alone on a line by itself at the end, is being used to quote a multi-line string. Basically, it's being used to indicate what the "print" command should print. This is sometimes called "here-document" quoting; you can call it whatever you want, but it's a real time-saver in CGI scripts. There's nothing special about the "EOF" string I used to delimit my output, by the way; you can use anything you like, as long as it's the exact same at the beginning and end of the quoted string (including capitalization). So I could have said:
and it would have worked fine. Just make sure, again, that "Walnuts" is all by itself on the last line. Even a space character after it will screw things up, as will anything in front of it. It needs to be right at the left margin, with nothing after it but a newline. Return to the top of the page
Did you follow that? First I used the Unix "mkdir" command to create a new directory called "begperl" in my Web space, which in my particular case is located at "/w1/l/lies". (Trivia: The Unix mkdir command is the only one I can think of that is actually longer than the equivalent DOS command, "md".) I chmodded the directory to permissions 755, then used the "cp" command to copy hello.pl from my home directory to the new directory in my Web space. Then I used the "cd" command to change to that directory, and used the "mv" command to change the file's name from "hello.pl" to "hello.cgi", so the Web server will know that it's a CGI script. A couple of additional things about directories and Web space and so on: My Unix command prompt has been customized to show my current directory; if yours hasn't, you can use the command "pwd" to list the current directory any time you lose track. Another handy tool is the
"~" symbol; when you use that in a command in the Unix shell, it is automatically replaced by the path of your "home" directory, which is the directory you start off in when you first log into the server. Many ISPs, by the way, use a convention of having users' Web stuff go in a directory called "public_html" beneath their home directory. If that's the case with your ISP, then you'll need to substitute directory names accordingly in the instructions given above, and use a Web address of the form "http://www.your-isp.com/~username/" or "http://www.your-isp.com/username/" to access documents in that public_html directory. Return to the top of the page
Did you get that? Great! If not, what happened? Did you get a "permission denied" error, by any chance? Then check your permissions: You probably lost the "execute" permission somewhere along the way. Return to the top of the page
In your case, it might be something like "http://www.your_isp.com/~your_username/hello.cgi". Whatever it is, go ahead and try it.
Internal Server Error The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, jbc@cyberverse.com and inform them of the time the error occurred, and anything you might have done that may have caused the error.
Ack. The dreaded "internal server error". You will see messages like this a lot when you are learning to run CGI scripts. What it means is, something (probably some sort of error message) got printed out by your script before the "Content-type: text-html\n\n" header. It sure would be helpful if you could see that error message. Fortunately, since your ISP is enlightened enough to let you run your own CGI scripts, they're almost certainly enlightened enough to give you access to the Web server's error log. In my case, it's at /w1/l/lies/.logs/error.log. One way to check that error log is to open up a second telnet window, log into a second shell session on your Unix server, and enter the command "tail -f /path/to/error.log". This will cause your window to display new entries in the error log as they are added. Then you can just pop back and forth from one window to the other to check the error log as you work on your script. In this case, I'm just going to use my existing shell session, and issue the "tail" command without the -f switch to print out the last 10 lines of the error log, looking for the problem that caused my script to fail. Lo and behold, there it is:
catlow:/w1/l/lies/begperl> tail /w1/l/lies/.logs/error.log (stuff deleted) exec of /w1/l/lies/begperl//hello.cgi failed, reason: Permission denied (errno = 13) [Fri Sep 11 23:50:18 1998] access to /w1/l/lies/begperl//hello.cgi failed for 207.71.222.193, reason: Premature end of script headers
So it was another "Permission denied" error. But wait; the script ran fine when I ran it manually from the command line. What gives? Return to the top of the page
Now let's try running hello.cgi via the Web server again, either by hitting the "Reload" button in our browser, or just typing the script's address into the Location box again and hitting "Enter":
Hello, world!
All right! Take a break and pour yourself a tall, frosty one. You've earned it.