Professional Documents
Culture Documents
I hear and I forget. I see and I remember. I do and I understand. (Confucius) Ensuring that digitally encoded information remains usable and understandable over time is, together with authenticity, at the heart of digital preservation. The previous chapter discussed some of the formal aspects of intelligibility. This chapter discusses the complementary issue of usability of the data. Usable means capable of use (OED), available or convenient for use (www.dictionary.com). In design, usability is the study of the ease with which people can employ a particular tool or other human-made object in order to achieve a particular goal. In human computer interaction and computer science, usability studies the elegance and clarity with which the interaction with a computer program or a web site is designed (Wikipedia). Here, by usable we mean that someone is able to do something sensible with the information it contains. We recognise that this might not be easy but at least it should be possible to carry out. One could of course use a digital object simply by printing out its constituent sequences of 1s and 0s on paper and using this to decorate ones home. However it seems reasonable to suppose that this has little to do with the information content in the digital object unless of course that is what it was designed for. For example the Arecibo message [130] was designed to be understood by extraterrestrials. This consisted of a sequence of 1,679 bits, which if displayed as 73 rows by 23 columns looks like Fig. 9.1 (the shading has been added on the right to make the different parts of the image clearer). The idea is that even with no shared cultural or linguistic roots one can rely on basic counting, an awareness of prime numbers, elements, chemistry and physics which any being able to receive the message might reasonably be expected to possess. It is not clear how many human recipients could decipher the message without help!
D. Giaretta, Advanced Digital Preservation, DOI 10.1007/978-3-642-16809-3_9, C Springer-Verlag Berlin Heidelberg 2011
167
168
Fig. 9.1 Arecibo message as 1s and 0s (left) and as pixels both black and white (centre) and with shading added (right)
9.1
169
or experience which can apply no matter what the difference in time and is necessary for usability by a wider community. This is a very important consideration which should help to justify the expenditure of those resources in preservation.
170
In principle we could use this, plus the Dictionaries in order to understand the keywords in order to extract the numbers
FITS FILE
FITS STANDARD
FITS DICTIONARY
DDL DESCRIPTION
If we cannot run the Java Virtual Machine then we use this source code to re-write in another programming language such as C
PDF STANDARD
DICTIONARY SPECIFICATION
DDL DEFINITION
DDL SOFTWARE
JAVA VM
PDF SOFTWARE
XML SPECIFICATION
UNICODE SPECIFICATION
If we can run this then we can run the Java software to extract the numbers If we cannot run this then we can use an emulator or use its RepInfo to re-create a Java VM
If we cannot run the DDL software then we can look at the DDL definition and write some software to extract the numbers
Fig. 9.2 Using the representation information network in the extraction of information from digitally encoded information (FITS le)
the data. Of course this RIN will also let us know which version of Java is needed and so forth. If the user can run the Java application then it is a simple matter to extract the number. Other options include: A. if (s)he does not have the correct version of Java at hand then (s)he at least has the option of trying to obtain it from another Registry/Repository because (s)he knows what is needed. a. An important variant of this is the use of emulators, described in Sect. 7.9. B. if the Java application cannot be run then it might be possible to take the Java source code, if available, and convert it to some programming language, say the C programming language, from which one can create an appropriate application. C. if neither (A) nor (B) are possible, then a data description language (DDL) such as EAST or DRB, together with the associated data dictionary, may be used. Again there are a number of possibilities. a. The easiest is that a generic application such as the one described in Sect. 7.3.5 can use the data description to extract the information needed. b. Otherwise one might have to read the DDL description, together with the denition of that DDL, and the associated Data Dictionary or other piece of
9.2
171
Semantic Representation Information, and then write an appropriate application. This would no doubt be harder, but at least one would not have to guess at what information the digital object holds. Some of these options are trivial which would be very convenient for the user. However if a trivial option is not available then at least the other options are possible the information can be extracted with considerable certainty and used for other purposes.
9.2.1 Migration/Transformation
Migration or more precisely Transformation (using OAIS terminology) involves changing the bit sequences from the original to something else. Following the recent revision of OAIS one can recognise that if this transformation is reversible then one can be condent that no information has been lost. On the other hand non-reversible transformations probably have lost information and someone must take responsibility to conrm that the transformation adequately maintains the important information. This is discussed in much more detail in Sect. 13.6. For those with an eye for recursion, the ways in which the transformation could be carried out are special cases of this sub-section, namely using a single digital object. For example one can use existing software, the subject of this sub-section, if there is software which can take in the original bit sequences in order to perform the transformation. One could alternatively use a data description language (DDL) description to extract values from the original and write them out as the new bit sequences. This could be done using generic applications as illustrated in Fig. 9.3 or else could be hand-crafted. The transformation chosen will of course be one which produces something which can be used by the software which has been chosen to deal with the
172
FITS FILE
OTHER DICTIONARY
FITS DICTIONARY DDL DESCRIPTION
FITS STANDARD
PDF STANDARD
DICTIONARY SPECIFICATION
DDL DEFINITION
DDL SOFTWARE
JAVA VM
PDF SOFTWARE
XML SPECIFICATION
UNICODE SPECIFICATION
Fig. 9.3 Using a generic application to transform from one encoding to another
information in the digitally encoded information. Authenticity evidence should of course be provided by someone, providing values and other information about selected Transformational Information Properties (also known as Signicant Properties), as discussed in Sect. 13.6.
9.2.2 Interfacing
A related but alternative way of using the digital object in ones preferred software is to use or create an appropriate programming interface. Whether or not this is possible depends upon the exibility of that preferred software for example whether or not it is possible to use plug-ins. Instead of transforming the digital object as a whole one essentially does it on the y, treating only the piece that is needed. The advantage is that one might be dealing with an object of many gigabytes, perhaps, in the case of scientic information, many terabytes (1 terabyte = 1,024 gigabytes) or even more. If one is only interested in a small part of the information then transforming the whole digital object may be a waste of effort. Being able to transform only the part that is needed can be a great saving in computation time and temporary disk storage in such circumstances.
9.4
Without Software
173
If a large number of such objects are to be dealt with, the cumulative savings could offset the effort needed to create the programming interface. With luck this may be done automatically; the alternative is to do it manually. 9.2.2.1 Manual Interfaces The manual option may be described using the data shown in Sect. 19 as an example. That data is essentially tabular. The EAST description allows one to extract individual values. It is in principle fairly easy to implement the following Java methods: public int getRowCount(); public int getColumnCount(); public Object getValueAt(int row, int column); in order to extend the AbstractTableModel class [71]. If this is done then many Java applications are available to manipulate or display the data (see Sect. 7.8.2.1.2). 9.2.2.2 Automated The automated option is the most convenient but is not often available. Essentially the manual steps above are carried out automatically. Whether or not this is possible depends, for example, on the amount and type of Representation Information available and the tools which can use them.
174
9.8
Summary
175
9.8 Summary
Although not providing all the details, it is hoped that this chapter will have provided the reader with an understanding of how digital objects may be used and re-used over the long-term. Examples of some of these are provided in Part II. It may not be a trivial process but, if the right Representation Information has been collected then at least it should be possible. It should also be clear that the formal description techniques offer the possibility of making re-use easier for the future users.