Professional Documents
Culture Documents
Cynthia Smith
July 9, 2007
IRLS 540
Introduction
One cultural phenomenon that has been underground in the United States for
decades and has only within the past decade or so become acknowledged is Japanese
Animation, commonly known as Anime. While creating stories and art based on Anime
has been going on also for decades (also known as fan fiction and fan art), with the
introduction of the internet, it has become easier than ever for fans to share their works
with each other. Many of these works are created digitally and only available digitally,
and they are arguably a part of our cultural heritage from the late 1990s to now. And no
one seems to be concerned with preserving these items of possible cultural interest.
Therefore, I propose the creation of a digital archive whose task it will be preserve these
cultural artifacts for our posterity. The purpose of this paper is to look at some of the
issues such a project would come across, what it would take to implement, and some of
the different options that are available. Specifically, selection, copyright, cataloging,
access, and actual preservation will be looked at.
Selection
There are several issues arising from selecting fan fiction and fan art to preserve.
One is finding these works in the first place. A web crawler could be created specifically
for the tags or titles fan fiction and fan art. This could result, however, in some
duplication and some misses. Naturally, human monitoring would be necessary as well to
keep track of what the web crawler is not finding to try to find ways to either improve the
web crawler, or to have points of human intervention. Even with human searching,
however, some items are still bound to be missed. The digital archive could also spread
word around of its existence and outline a process where creators of fan fiction/fan art
could submit their works directly if they so choose. Part of the problem of archiving fan
fiction in particular is that for particularly long works the author often posts chapters in
various installments, meaning that it might take a web crawler several times to capture all
of a work, the first part of which may already be archived by the time the final
installment is submitted. Or, with the ease of posting to the internet these days, a work
may have been submitted to several different sites, leading the web crawler to document
multiple copies of a work. Thus, while web crawlers may be able to take much of the
initial burden off humans, close human monitoring will be necessary.
Another issue is the fact that the quality of these works ranges from near
professional quality to wondering why the person bothered to post the work on the
internet in the first place. Yet the time it would take to go through to sort out the good
from the bad would probably be more than the archive could afford to spend. Webmasters
of fan fiction sites are of little help, since rarely do they have much criteria as far as
quality beyond basic spelling and grammar checking. There is also the consideration that
many of these creators are young and do well for their level, which may be a low one in
the beginning but may grow over time. In this instance, the time it would take to check
for quality would be too high, though there should be a review process for items that are
particularly obvious in being of questionable value. This process should have clearly
defined steps with clearly defined criteria having nothing to do with the content of the
piece in question.
There is one criterion as far as content is concerned that must be absolute in the
archive: Any work under consideration by the archive must be related to Anime in some
way. This rule may eventually be reconsidered in the future to allow other fandoms to be
included such as Star Wars or Harry Potter, but for a beginning point Anime will be the
focus.
There are possible legal problems with content considering that many potential
visitors to this digital archive may be minors, but this could be effectively handled by a
combination of cataloguing, which shall be detailed bellow under cataloguing, and
sectioning off a part of the site to be used only by those over eighteen, which also shall be
detailed below under access.
Copyright
Most sites that host fan fiction/fan art have copyright statements to the effect that
the fan fiction/fan art is the property of their creator and any who wish to use them in any
way (other than reading on the hosting site) must ask the creator for permission. Legally,
it would probably be best to get permission to archive from each individual, though this
logistically becomes problematic. Some of this could be simplified by the archive by not
only gaining permission to preserve and display all current works the author has created
but also all future works unless specifically prohibited by the individual. Initiatives could
also be begun with prominent fan fiction/fan art sites where anyone who submits work
would also be asked permission to have their work to be archived and displayed in the
digital library. Anyone who directly petitions the digital archive to preserve their work
will automatically be asked for the rights of archiving and display.
In some exceptional cases, an author could possibly request specific restrictions
on their works. A possible example would be if a person wrote fan fiction during their
adolescence and later became a prominent author and wanted their earlier work placed
under restriction for privacy or other reasons. The digital archive would then do its best to
negotiate for the widest possible access, but still respect the wishes and rights of the
creator. In all cases, copyright should be strictly kept track of. To assist with this, it would
be advisable to include as part of the metadata any copyright information.
The archive must also have a clear policy about works where the
creator/copyright holder is unknown and also a procedure to follow once an author
becomes known.
Cataloging
The general framework that would be ideal for cataloging this archive would be
the Dublin Core in XML. (Stielow 2003, 113) Other alternatives such as MARC are
available, but Dublin Core has the advantage of being fairly comprehensive in metadata
without having more available than is probably needed. XML is the up and coming
language of the web, and has the advantage of being able to not only handle content and
format, as HTML does, but it also can handle multimedia. This will allow the archive the
greatest current possibility of future expansion. Several physical copies of the archive
catalogue will be made to be distributed to key personnel and locations to be kept off site
incase of disaster.
Cataloging for this digital archive specifically has several different components to
be considered. First, these items need to be searchable by the Anime that is depicted in
the work. It may also be worthwhile to divide by genre, such as comedy, fantasy,
adventure, etc. for further searchability, realizing that this will create more work for the
cataloger. Considering that many possible visitors may be minors, it may also be
worthwhile to note which works are particularly graphic in their description of sex or
violence, etc. Since this is all available online, and no record really needs to be kept of
who views what, this should eliminate the chilling effect while warning viewers of
content they may not be comfortable with. Again, this would create more work for
catalogers. Some websites that host fan fiction and fan art already have such genre
categories and warnings, which could be used as a basis for such descriptions helping to
eliminate part of the work of the cataloger. Again, the procedures for this process would
need to be explicitly laid out, along with a process for reconsideration.
Some works and some websites specifically cater to shonen ai (romanticized love
between boys) and yaoi (shonen ai that is explicitly sexual). They also have their female
counterparts. (Poitras, 2007) All of this content should be at least clearly labeled as such,
and usually are in fan sites so those who are uncomfortable with such stories may avoid
them. Further issues regarding yaoi and like works are discussed under access.
Access
There are two problems of access in a digital archive. The first is the more general
one of providing as much access as possible while respecting copyright and privacy
rights, much of which has already been discussed above.
One problem specific to this archive is the fact that a large fraction of visitors will
probably be minors. According to US v. ALA, in which the Supreme Court ruled on the
Child Internet Protection Act, children do not have the right to view obscenity or child
pornography (adults do not have the right to view these either), or anything harmful to
minors (US v. ALA, 2003). Because of the nature of some of the fan fiction/fan art
(shonen ai and yaoi especially), some could possibly be considered harmful to minors.
In fact, fan fiction sites that cater to such audiences often have warnings at the entrances
to the section of fan fiction that houses such stories. Some even prohibit anyone under
eighteen from entering sections of the site that host such works. Certainly, the archive is
not in a position to be a law enforcement officer. On the other hand, the archive could be
found legally negligent if we ignore this issue, especially considering that a large fraction
of our probable clientele will be minors. First, a clear definition of what is harmful to
external equipment is needed. This file format would also limit future projects to static
words and pictures.
Adobe Acrobatic PDF file format has the advantage of basically taking a snapshot
of the content and saving it, so little if any formatting or content is lost. Unfortunately,
the software to read these files is proprietary, adding another level of complication to the
needed equipment. Also, PDF does not necessarily capture everything from the original
document. Adobe Acrobatic also works primarily with still images again limiting future
projects.
The one thing that HTML does well, part of the reason it was created, was to
include format along with content. It is also non-proprietary and open source. The one
issue with HTML is that it does not handle multimedia well. Given the nature of this
collection, however, this may not be an issue. Also, since many of these documents are
likely already in html format, this would mean no loss of data due to transferring to a
different file format. Since this format does not handle multimedia well, however, it could
again limit future projects.
XML is one of the latest and most promising formats, preserving not only content
but also formatting information, and unlike HTML can handle multimedia. This means
that in the future the archive could easily expand to begin archiving Anime Music Videos.
It also has the benefit of being non-proprietary. Probably the biggest risk is the fact that it
has had the shortest life thus far, and while its future seems bright today there is no
guarantee that this will in fact be the case.
In all cases, the file format would be uncompressed to help preserve maximum
content from the artifact, keeping in mind that in the future it is unknown what
information from the artifact may be considered important or relevant.
The problem with transferring data to cd/dvd is the fact that it is unknown exactly
how long these disks will last with their information uncorrupted. There is also no
guarantee how long cd/dvd readers will be in popular use and therefore how long the
needed equipment to play them will be available.
With servers, there are basically two options: migration and emulation. Migration
basically means continuously moving the information into formats and onto media that
are readable as technology changes. Emulation is basically making one computer system
act like another. So the computer of the future could be made to act exactly like a
computer that is used today, meaning it will then be capable of reading the formats of
today, assuming that the format is known and the software to read it is known. This will
also mean that the information will be able to be seen exactly as it is seen today, as long
as all the specifications are known. Fortunately, some of this information, i.e. file format,
is information that should already be included in the metadata. Unfortunately, that still
leaves some extra metadata work to be done. Another concern is the fact that emulation is
the newer method.
In this case, a tiered approach would be best. Testing should be done between
HTML and XML. Since most of the material is likely to be in HTML, staying with this
format would save time. However, XML would allow for greater flexibility in future
projects for the archive. Subject to testing, HTML is the recommended format for initial
use since most items are already in HTML. A secondary copy of the archives holdings
could be made in XML as the archive has time and resources. One consideration of this is
the ease of converting HTML to XML. As for storage format, servers would be the
preferred format, migrating from server type to server type as technology progresses.
However, emulation should be reconsidered and implemented as it is proved to be
feasible, cost-effective, and reliable. Also, it is recommended that a duplicate server be
stored and maintained offsite in case of emergency. This duplicate server would house the
XML copy of the holdings. If the time comes when the archive decides to start archiving
multimedia works (Anime Music Videos), the XML copy would be brought to the front.
Technological conditions at that point in the future would determine whether the
secondary copy of the archive would be a straight XML copy or whether another file type
would be more ideal.
My Recommendations
As far as selection is concerned, the main consideration will be whether the
artifact deals with Anime in any way, although there will also be a process whereby
artifacts may be brought back for reconsideration under more careful scrutiny. This is
because the time it would take to scrutinize each artifact for further evaluation, and also
because of the difficulty in setting criteria for quality for such a wide range of submitters.
Bibliography