Professional Documents
Culture Documents
http://cpansearch.perl.org/src/JMCNAMARA/Spreadsheet-WriteExcel...
The following article appeared in The Perl Journal, Fall 2000. It is reprinted here by kind permission of Jon Orwant and The Perl Journal. Copyright (c) 2000, The Perl Journal.
Spreadsheet::WriteExcel
John McNamara Resources: Spreadsheet::WriteExcel ................................................................... CPAN OLE::Storage ................................................................................... CPAN Win32::OLE ..................................................................................... CPAN XML spces for Excel ................. http://msdn.microsoft.com/library/officedev/ Gnumeric ............................................................... http://www.gnumeric.org HTML::TableExtract ........................................................................ CPAN Excel SDK newsgroup .............................. news://microsoft.public.excel.sdk OLE Compound File ..... http://user.cs.tu-berlin.de/~schwartz/pmh/guide.html Herbert ........................................ http://user.cs.tu-berlin.de/~schwartz/pmh/ Filters .................................................. http://arturo.directmail.org/filtersweb/ xlHtml ....................................................................... http://www.xlhtml.org/ One of Perl's great strengths is the ability to filter data from one format into another. Data goes in one end of a Perl program and miraculously comes out the other end as something more useful. Your Sybase file goes into Perl counselling and after a few short sessions comes out feeling like a brand new Oracle file. However, not all file formats are readily accessible. Certain proprietary file formats, and in particular binary files, can be difficult to handle. One such format is the Microsoft Excel spreadsheet file. Excel is the spreadsheet application at the heart of the Microsoft Office suite. It is a popular tool for data analysis and reporting, and even though it is only available on Windows and Macintosh platforms there is often a requirement to produce Excel compatible files on Unix platforms. (Several rumors and some evidence of a Linux port of Microsoft Office have recently come to light on Slashdot.) This article describes Spreadsheet::WriteExcel, a cross-platform Perl module designed to write data in the Microsoft Excel binary format. It highlights the fact that although Perl is most often associated with text files, it can readily handle binary files as well. This article also looks at alternative methods for producing Excel files and suggests some methods for reading them.
Using Spreadsheet::WriteExcel
A single Excel file is generally referred to as a workbook. A workbook is composed of one or more worksheets, which are pages of data in rows and columns. Each row and column position within a workbook is referred to as a cell. Spreadsheet::WriteExcel creates a new workbook to which you can add new worksheets. You can then write text and numbers to the cells of these worksheets. The following Perl program is a simple
1 of 11 27-02-14 17:06
http://cpansearch.perl.org/src/JMCNAMARA/Spreadsheet-WriteExcel...
example:
#!/usr/bin/perl -w use strict; use Spreadsheet::WriteExcel; # Create a new Excel workbook called perl.xls my $workbook = Spreadsheet::WriteExcel->new("perl.xls"); my $worksheet = $workbook->addworksheet(); # Write some text and some numbers # Row and column are zero indexed $worksheet->write(0, 0, "The Perl Journal"); $worksheet->write(1, 0, "One" ); $worksheet->write(2, 0, "Two" ); $worksheet->write(3, 0, 3 ); $worksheet->write(4, 0, 4.0000001 );
Figure 1: Example file Written with Spreadsheet::WriteExcel
What is happening here is that we are using the Spreadsheet::WriteExcel module to create a variable that acts like an Excel workbook. We add a single worksheet to this workbook and then write some text and numbers. Figure 1 shows how the resulting file looks when opened in Excel. The Spreadsheet::WriteExcel module provides an object-oriented interface to a new Excel workbook. This workbook is an object (a variable) that acts as a container for worksheet objects (more variables), which themselves provide methods (functions) for writing to their cells. The primary method of the module is the new() constructor, which takes a filename as its argument and creates a new Excel workbook:
$workbook = Spreadsheet::WriteExcel->new($filename);
The workbook is then used to create new worksheets using the addworksheet() method:
$worksheet = $workbook->addworksheet($sheetname);
If no $sheetname is specified, the general Excel convention for worksheet naming will be followed: Sheet1, Sheet2, and so on. The worksheets are stored in an array called @worksheets which can be accessed through the workbook object. In a multi-sheet workbook you can select which worksheet is initially visible with the activate()
2 of 11
27-02-14 17:06
http://cpansearch.perl.org/src/JMCNAMARA/Spreadsheet-WriteExcel...
method. The worksheet objects provide the following methods for writing to cells:
write($row, $column, $token) write_number($row, $column, $number) write_string($row, $column, $string)
The write() method is an alias for one of the other two write methods. It calls write_number() if $token looks like a number according to the following regex:
$token =~ /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/
Otherwise it calls write_string(). If you know in advance what type of data needs to be written, you can call the specific method, and otherwise you can just use write(). Here is another example that demonstrates some of these features:
#!/usr/bin/perl -w use strict; use Spreadsheet::WriteExcel; # Create a new Excel workbook my $workbook = Spreadsheet::WriteExcel->new("regions.xls"); # Add some worksheets my $north = $workbook->addworksheet("North"); my $south = $workbook->addworksheet("South"); my $east = $workbook->addworksheet("East"); my $west = $workbook->addworksheet("West"); # Add a caption to each worksheet foreach my $worksheet (@{$workbook->{worksheets}}) { $worksheet->write(0, 0, "Sales"); } # Write some data $north->write(0, 1, $south->write(0, 1, $east->write (0, 1, $west->write (0, 1,
3 of 11
27-02-14 17:06
http://cpansearch.perl.org/src/JMCNAMARA/Spreadsheet-WriteExcel...
You can also create a new Excel file using the special Perl filehandle -, which redirects the output to STDOUT. This is useful for CGI programs generating data with a content-type of application/vnd.ms-excel.
#!/usr/bin/perl -w use strict; use Spreadsheet::WriteExcel; # Send the content type print "Content-type: application/vnd.ms-excel\n\n"; # Redirect the output to STDOUT my $workbook = Spreadsheet::WriteExcel->new("-"); my $worksheet = $workbook->addworksheet(); $worksheet->write(0, 0, "The Perl Journal");
The Spreadsheet::WriteExcel module also provides a close() method which can be used to close the Excel file explicitly. As usual, the file will be closed automatically when the object reference goes out of scope or when the program ends. Finally, the following is a slightly more useful example - a Perl program that converts a tab-delimited file into an Excel file:
#!/usr/bin/perl -w use strict; use Spreadsheet::WriteExcel; # Check for valid number of arguments if (($#ARGV < 1) || ($#ARGV > 2)) { die("Usage: tab2xls tabfile.txt newfile.xls\n"); }; # Open the tab-delimited file open (TABFILE, $ARGV[0]) or die "$ARGV[0]: $!"; # Create a new Excel workbook my $workbook = Spreadsheet::WriteExcel->new($ARGV[1]); my $worksheet = $workbook->addworksheet(); # Row and column are zero indexed my $row = 0; while (<TABFILE>) {
4 of 11
27-02-14 17:06
http://cpansearch.perl.org/src/JMCNAMARA/Spreadsheet-WriteExcel...
chomp; # Split on single tab my @Fld = split('\t', $_); my $col = 0; foreach my $token (@Fld) { $worksheet->write($row, $col, $token); $col++; } $row++; }
The BIFF data is stored along with other data in an OLE Compound File. This is a structured storage format that acts like a filesystem within a file. A Compound File is composed of storages and streams which, to follow the file system analogy, are like directories and files. This is shown schematically in Figure 3.
Figure 3: The compound File system used to store Excel data.
One effect of the file system structure is that the BIFF data within the Compound Files is often fragmented, and the files occasionally contain lost blocks of data. The location of the data within a Compound File is controlled by a file allocation table (FAT). The documentation for the OLE::Storage module contains one of the few descriptions of the OLE Compound File in the public domain, at http://user.cs.tu-berlin.de/~schwartz/pmh/guide.html. The
5 of 11
27-02-14 17:06
http://cpansearch.perl.org/src/JMCNAMARA/Spreadsheet-WriteExcel...
source code for the Gnumeric spreadsheet Excel plugin also contains information relevant to the Excel BIFF format and the OLE container at http://www.gnumeric.org/.
6 of 11
27-02-14 17:06
http://cpansearch.perl.org/src/JMCNAMARA/Spreadsheet-WriteExcel...
pack is used to write the BOF binary record in the following subroutine from Spreadsheet::WriteExcel:
sub _store_bof { my $self my $name my $length = shift; = 0x0809; = 0x0008;
my $version = $BIFF_version; # 0x0500 for Excel 5 my $type = $_[0]; # 0x05 = workbook, 0x10 = worksheet my $build my $year my $header my $data = 0x096C; = 0x07C9; = pack("vv", $name, $length); = pack("vvvv", $version, $type, $build, $year);
$self->_prepend($header, $data); }
The string written to the Excel file looks like this in hexadecimal:
09 08 08 00 00 00 10 00 00 00 00 00
The v template produces a two-byte integer in little-endian order regardless of the native byte order of the underlying hardware. Since the majority of the BIFF and OLE data in an Excel file is composed of little-endian integers, it's possible to write a cross-platform binary file with very little effort. The complementary function for reading fixed format structures is unpack. Perl is most often associated with text processing, but has features that handle binary data in a relatively straightforward manner. One problem I encountered was with the binary representation of a floating-point number, since Excel requires a 64-bit IEEE float. pack provides the d template for a double precision float, but its format depends on the native hardware. If Spreadsheet::WriteExcel cannot generate the required number format, it will croak() with an error message. During installation, make test will also catch this. Nobody has reported a problem yet, probably because the owners of PDPs or Crays are involved in real computing and aren't interested in such fripperies as Microsoft Excel. There is one feature of writing binary files that traps everyone at least once. Consider the following example, which writes the Excel end-of-file record identifier, 0x000A. What file size is printed out?
#!/usr/bin/perl -w use strict; open (TMP, "+> testfile.tmp") or die "testfile.tmp: $!"; print TMP pack("v", 0x000A); seek (TMP,0,1); my $filesize = -s TMP; print "Filesize is $filesize bytes.\n";
The answer depends on your operating system. On Unix the answer is 2, and on Windows the answer is 3. This is because 0x0A is the newline character, \n, which your Windows's I/O libraries will translate to 0x0D 0x0A or \r\n. This is a "feature" of Windows, not Perl. To write a binary file with exactly the data you want and nothing else, you need to use the binmode() function on the filehandle.
7 of 11
27-02-14 17:06
http://cpansearch.perl.org/src/JMCNAMARA/Spreadsheet-WriteExcel...
For us, "Application" means Excel. In other contexts it might mean Word or PowerPoint. Spreadsheet::WriteExcel mimics this hierarchy with five classes, each split into its own packages. For ease of development, each package is contained in its own module.
WriteExcel Workbook Worksheet BIFFwriter OLEwriter The main module A container for worksheets Provides the write methods Writes data in BIFF format Write data into an OLE storage
The interaction of these packages is shown as low-tech UML in Figure 4. Only the documented public methods are included.
Figure 4: The structure of the Spreadsheet::WriteExcel module.
The relationships can be described as follows: WriteExcel is a Workbook. Workbook is a container for Worksheets, and it uses the OLEwriter class. Workbook and Worksheet are both derived from the
8 of 11
27-02-14 17:06
http://cpansearch.perl.org/src/JMCNAMARA/Spreadsheet-WriteExcel...
Win32::OLE
As is often said, only perl can parse Perl. Similarly, only Excel can grok and spew Excel. Tackling the binary file head on is fine up to a certain point. After that it's best to leave the dirty work to Excel. By far the most powerful method of accessing an Excel file for either reading or writing is through
9 of 11
27-02-14 17:06
http://cpansearch.perl.org/src/JMCNAMARA/Spreadsheet-WriteExcel...
OLE and OLE Automation. Automation is the process by which OLE objects, such as Excel, act as servers and allow other applications to control their functionality. When applied to the Microsoft Office suite of applications, this process is known as Office Automation. The following is a textual description of how you might use Automation with Excel: Request Excel to start Request Excel to write some cells Request Excel to save the file Request Excel to close To do this in Perl requires a Windows platform, the Win32::OLE module, and an installed copy of Excel. Here is an example:
#!/usr/bin/perl -w use strict; use Cwd; use Win32::OLE; my $application = Win32::OLE->new("Excel.Application"); my $workbook = $application->Workbooks->Add; my $worksheet = $workbook->Worksheets(1); $worksheet->Cells(1,1)->{Value} $worksheet->Cells(2,1)->{Value} $worksheet->Cells(3,1)->{Value} $worksheet->Cells(4,1)->{Value} $worksheet->Cells(5,1)->{Value} = "The Perl Journal"; = "One"; = "Two"; = 3; = 4.0000001;
# Add some formatting $worksheet->Cells(1,1)->Font->{Bold} $worksheet->Cells(1,1)->Font->{Size} $worksheet->Cells(1,1)->Font->{ColorIndex} $worksheet->Columns("A:A")->{ColumnWidth} # Get current directory using Cwd.pm my $dir = cwd(); $workbook->SaveAs($dir . '/perl_ole.xls'); $workbook->Close;
Figure 5: An example file written with Win32::OLE and Excel.
= = = =
The result is shown in Figure 5. Without the formatting code, this program produces an Excel file which is almost identical to the one shown in Figure 1.
10 of 11 27-02-14 17:06
http://cpansearch.perl.org/src/JMCNAMARA/Spreadsheet-WriteExcel...
There are some issues that we've skirted here, particularly in relation to starting and stopping an OLE server. A more detailed introduction to the Win32::OLE module is given by Jan Dubois in TPJ #10 at http://www.itknowledge.com/tpj/issues/vol3_2/tpj0302-0008.html. For additional examples see http://www.activestate.com/Products/ActivePerl/docs/faq/Windows /ActivePerl-Winfaq12.html and http://www.activestate.com/Products/ActivePerl/docs/site/lib/Win32 /OLE.html. As a brief diversion, the following program uses Win32::OLE to expose the flight simulator Easter Egg in Excel 97 SR2.
#!/usr/bin/perl -w use strict; use Win32::OLE; my $application = Win32::OLE->new("Excel.Application"); my $workbook = $application->Workbooks->Add; my $worksheet = $workbook->Worksheets(1); $application->{Visible} = 1; $worksheet->Range("L97:X97")->Select; $worksheet->Range("M97")->Activate; my $message = "Hold down Shift and Ctrl and click the ". "Chart Wizard icon on the toolbar.\n\n". "Use the mouse motion and buttons to control ". "movement. Try to find the monolith. ". "Close this dialog first.";
$application->InputBox($message);
Obtaining Spreadsheet::WriteExcel
The latest version of the module will always be available at CPAN, at http://search.cpan.org /search?dist=Spreadsheet-WriteExcel. ActivePerl users can download and install the module using PPM as follows:
C:\> PPM> PPM> PPM> C:\> ppm set repository tmp http://homepage.eircom.net/~jmcnamara/perl install Spreadsheet-WriteExcel quit
_ _END_ _
John McNamara (jmcnamara@cpan.org) works as a software developer for Tecnomen Ltd. Apart from the usual things that engage us all, his main interest in life is the Saab 900 series. He lives in Limerick, Ireland.
11 of 11
27-02-14 17:06