Professional Documents
Culture Documents
second, FileMerge, is part of the free chine may immediately start a backup. After collecting a number of ver-
Apple Xcode developer package. Both If not, click on the clock icon in the sions of your website, you may want
programs feature intuitive graphic in- menu bar on the upper right of your to compare these versions and make
terfaces, and neither requires any spe- screen and scroll to “Back Up Now.” appraisal decisions about which are
cial skills. Using these programs, it is You can continue working while the worth keeping and which you can put
possible to maintain control of the dif- computer is performing a backup. It is in the trash. To retrieve older versions
ferent versions of the websites you’re wise to make sure your external hard of your website from Time Machine,
tracking and to quickly appraise new drive has plenty of space since when make sure your external hard drive is
versions to determine if they are dif- the disk becomes full, Time Machine plugged in and turned on. Click the
ferent from previous iterations. will begin to erase the oldest versions icon and press “Enter Time Machine.”
Step 1: Managing backups with to make room for new ones. A Finder window (which shows your
Time Machine. As mentioned, Time
Machine comes preinstalled on Macs
with operating system versions 10.5 or
higher. Most Macs purchased since late
2007 should come with this program,
which creates incremental backups,
meaning that each new backup does
not override previous versions. This
feature allows you to track down and
restore files that, for example, you may
have accidentally deleted from your
hard drive. The program can be set up
to back up your entire hard drive, mul-
tiple files, or a single file onto an ex-
ternal drive.
It is possible to achieve the same
end results without using Time Ma-
chine, but using Time Machine has two
significant benefits:
files and folders) will drop down in press the back arrow on the lower do so, the version of the files that you
front of an identical stack of windows right of the screen. You can see every are looking at will replace the latest ver-
on an “outer space” backdrop. version saved along with the date it sion, and you may lose some files.
Enter the name or partial name of was saved. Select the versions you’re The folders containing the files are
your file or folder in the spotlight interested in and copy and paste each now clearly labeled and ready to be
search box on the upper right of the one to the desktop. Rename each compared. Time Machine is limited in
Finder window. Time Machine searches folder, adding the date of the files at this way; it holds versions but cannot
for the files and presents the most re- the end, e.g., “Library_Website_June29.” show you what changes have been
cent version in the Finder window at Do not press the “Restore” button on made. To do this you need the program
the front. To go to earlier versions, the bottom right of the screen. If you FileMerge.
Step 2: Comparing versions of
the site with FileMerge. FileMerge
is included in Apple’s Xcode developer
package, which is freely available at
http://developer.apple.com/tools/xcode.
The program compares two similar
folders and shows any differences be-
tween them. When you run the appli-
cation, it will prompt you for two fold-
ers or files, labeled “right” and “left.”
Begin by selecting the first and second
versions of the site you have archived.
When you press the “Compare” button,
FileMerge will display a folder struc-
ture, showing all files that appear in
either folder. Files that are identical in
both folders will appear in gray. Files
that are only present in one folder or
the other, or which are not exactly the
same in each folder, will appear in
black. The check boxes at the upper
right of the window allow you to
choose which categories of file will be
displayed.
You may wish to make an appraisal
decision based simply on how many
files have been added, removed, or
changed from one version of the site to
another. Depending on the goals of your
archive, you may decide that a new ver-
sion of the site only merits retention if
it includes more than 10 new files, or
that any change at all represents a new
version that should be kept.
If you are familiar with HTML, you
can also use FileMerge to view more
specifically what has changed about a
page without having to read the entire
contents of the page. To do this, select
Here are two screen shots of FileMerge. We asked the program to compare the April 6 and April 2 versions of the sahughes the page in FileMerge and choose
files. The program lists individual files and notes differences between the two versions. Here, we’ve highlighted the file
“travel.html,” which has four changes. We then double-clicked on travel.html, and a window popped up, showing the newer “Comparison” from the drop-down menu
and the older versions of the file in columns next to each other. “View.” FileMerge will display the code
of the two versions side by side, with FileMerge compares two similar folders and
each change highlighted and enumer-
ated. Using this information, you may
find it easier to make a precise deter- shows any differences between them.
mination about whether changes are
significant or minor. Of course, what
constitutes a significant change will
depend on the nature of your project. software that holds web content in a page of a Twitter account, though it
If you determine that a version of the database and generates pages dynam- cannot reach archived posts.
site is not different enough to keep, ically, based on the user’s request. This
simply delete it from your computer includes sites created using either blog-
Conclusion
and/or external hard drive. ging software or a content management
system (CMS) such as Joomla. Archiving websites can be ex-
In order to mirror these sites ex- tremely useful, but simply owning the
Presentation
actly as they exist on their servers, files doesn’t guarantee they will con-
Once you have decided to retain a SiteSucker would have to download tinue to be usable. If you want to
particular version of a site, you’re the database software from the server maintain your web archive into the fu-
ready to make that version available and run it on your computer. For many ture, you will need to think about
to your patrons. From a technical reasons, this is not possible. What long-term preservation. This is a com-
standpoint, you can serve the files by SiteSucker will do instead when pre- plex issue, even for relatively stable
any method you would use for other sented with this type of site is to fol- file formats such as HTML. But it can
HTML files: You can host them on a lo- low every link on every page. In data- be simplified by timely and thought-
cal computer or network, lend them base-backed sites, this can produce ful action. Two great sources of infor-
out on a CD-R, or even place them on thousands of possibilities because mation on the topic are the Digital
a public web server. Using any of these each piece of content can often be dis- Curation Centre in the U.K. (www.d
methods, it is important to direct the played in a variety of contexts. cc.ac.uk) and Australia’s PADI (Pre-
user first to the index file (either in- SiteSucker’s settings can help limit serving Access to Digital Information;
dex.html, index.htm, or index.php). your archive to a reasonable number www.nla.gov.au/padi). An excellent
This can be achieved by written in- of files, either by setting a hard limit first step is to maintain consistent
structions or by linking directly to the on the number of files downloaded or and detailed metadata about the files
file from a menu. Care should be taken by limiting the number of links the you are archiving. Simply knowing
to avoid violating the copyright of con- program will follow in succession, us- the format and the creation date of a
tent owners. Though the law in this ing the levels setting. Limiting levels file may make the difference between
area remains somewhat unsettled, a to two will return only the homepage a usable file and an unusable one, 20,
useful and brief guide is the Oakland and files linked directly from it. 50, or 100 years from now.
Archive Policy (http://www2.sims.ber Sites created on many blogging and
keley.edu/research/conferences/aps/ CMS platforms do not render correctly
removal-policy.html), which has been when downloaded by SiteSucker. This
adopted by the Internet Archive. is often because the formatting infor-
mation for these sites is contained in
scripts that SiteSucker cannot harvest Katharine Dunn (katharine.dunn
Notes on Blogs and Other
or interpret. Whether a particular @simmons.edu) is a graduate student
Database-Backed Sites platform will work with SiteSucker in library and information science at
SiteSucker, the software we’ve used depends on the details of the software Simmons College in Boston. Dunn, who
to harvest websites, was primarily de- itself. Fortunately, some of the most is also a freelance magazine writer,
signed to work on sites based on plain popular blogging platforms, including works as the school’s editorial fellow.
HTML. When you harvest a plain WordPress and Blogger, rendered cor- Nick Szydlowski is a library assistant
HTML site, the end product should rectly in our tests. For blogs created in in preservation services at the MIT Li-
mirror what was originally on the web other software, we were able to archive braries. He is currently pursuing an
server: the same files in the same folder the content, but the formatting infor- M.L.S. from the Simmons College
structure. However, more and more mation was not retained. SiteSucker Graduate School of Library Science. He
sites these days are designed using can also be used to harvest the main can be reached at nick_s@mit.edu.