You are on page 1of 13

Push for pull

The circuit of findability, use and enrichment


by Catherine Styles

This paper is about the future of online access to national archives. In the sense that online has become the primary mode of accessing national archives, it is about the future of access in general. And in the sense that without good access to archives most people dont use them and they become less relevant, its also about the future of archives. However grand that may seem, my purpose is simple. In this paper I contend that findability is critical and requires new approaches, and I identify the potential for user-generated description to work in the service of findability. To facilitate access to the archives in a way that is sustainable into the future, we need to make the records more findable, via the online interface to the collection database, but also via other means. Archival description makes the records findable for those that understand archival systems. But untrained users need to be able to find records in their own way. The National Archives collection is big. Around 9000 agencies, organisations and persons have created over 45 thousand series, which contain around 70 million items. 1 Findability of records within any large collection is a challenge, and National Archives is no exception. I have found it useful to think of findability in terms of push and pull, after the information architect, Peter Morville. 2 Push is the marketing aspect, where we highlight particular records on our website or via other media. Pull is what we want users to be able to do for themselves, as they seek out records of their own choosing. Theres always going to be a place for both kinds of findability, but enabling the pull is the crux of the findability challenge.

1 2

These figures come from the draft descriptive strategy by Brendan Somes.

My use of this opposition comes from the chapter Push and pull in Peter Morvilles book Ambient Findability, 2005: 98118. Morville is a strong advocate for enabling users to pull information to themselves rather than only pushing information to them: Marketing has entered a period of high-speed evolution [...] And in todays attention economy, fitness requires a new balance between push and pull [...] Lets make it easier for our customers to find what they need when they need it. (103) National Archives of Australia 1

Catherine Styles

Push for pull

Pull RecordSearch At the highest level, of entities that create series, the collection is very well described in RecordSearch, the National Archives online catalogue. Every one of the 45,000 series is also registered, sometimes with detailed notes describing the contents of the records, and thereby increasing their chance of retrieval in a search. That is a wonderful situation from the perspective of archivists, for whom intellectual control is paramount. But prospective readers need to identify and request archives at the level of the item. For readers, findability of discrete items is paramount.

Around 10% of the collection is registered at item-level. 3 In accordance with archival principles, the items title generally reflects the title of the original file. As a result, many of these 7 million items are titled using what can appear to a researcher as a catch-all term, such as Correspondence. 4 Few entries include data in the notes field, so if youre searching at the level of items, it is not easy to retrieve all the relevant results. Thanks to the contextual relationships of the Commonwealth Record Series (CRS) system, there are clues to an items contents in its series and creator. 5 But few researchers come to RecordSearch with an understanding of the CRS system. We

If 10% of items are described in RecordSearch, 90% of items are not. Arranging to read any of the undescribed items requires a reference officer, consignment lists and a lot of time. But hundreds of thousands of items are registered every year, and the process for selecting which series to itemise is strategic, based on principles of high-use, preservation and so on. Indeed, in some cases, other organisations have more detailed descriptions of National Archives records than RecordSearch does. This is the case of Film Australias descriptions of National Archives film.

And obviously, it is most productive to approach national archival research from the perspective of government functions and the agencies responsible for creating records of those functions National Archives of Australia 2

Catherine Styles

Push for pull

need to make it absolutely simple for people to find what theyre looking for, or they will pass on by. 6 1.6 million items, or just over 2%, are digitised as JPG images. Its a small percentage, but it amounts to 20 million folios that can be read online. You might imagine that fully digitised material would be easier to find in RecordSearch. 7 But thats not so.

You can only find a digitised file via the item to which it is attached. There is no (extra) descriptive data attached to digitised documents. Digitised records are loaded as JPGs, so their text content is not searchable.

So if youre looking for digitised material, its best if you know the CRS system. It also helps if you know the tricks of limiting search results to those with a digital copy attached. 8 Other interfaces to the collection database In addition to the main search interface, RecordSearch, the National Archives provides several other search interfaces NameSearch, 9 PhotoSearch, 10 and the Bringing Them Home name index. 11

This point has become so often stated that it hardly needs referencing. Ive read it most recently on Sebastian Chans blog, fresh & new(er): If your users cant find it, it may as well not exist. He is responding to a post by Scott Karp, of Publishing 2.0, which states: Web users are utterly unforgiving. It if doesnt work the way I want, Im going in a click [...] Google understands this. If publishers want to compete, they need to accept [...] that user experience is EVERYTHING. See also the report commissioned by the British Library and Joint Information Systems Committee, Information behaviour of the researcher of the future (2008:30). The functional and design update to RecordSearch currently in progress will provide contextual help for readers, which will be a significant step in the direction of training users on-the-fly in the mysteries of the CRS system. For example, it will indicate to users where a series link goes, and what information can be obtained by following it, without the user having to click through. But we are a long way from a completely new interface to RecordSearch.
6 7

And so far, RecordSearch is the only way to find items that have not been featured in other web content, because RecordSearch items are not discoverable through search engines. For example, to retrieve all digitised images in a series, use: Keyword rs@i ph@i; select the option Any words and enter a reference number eg A11016. Search help (including the trick quoted here) is available within RecordSearch. And we do provide training in using the system. Training takes various forms workshops (both scheduled and on request), oneone-one assistance in reading rooms, and the step-by-step guide on our corporate website. NameSearch enables people to search across name-related series such as immigration, naturalisation and defence service files. It is especially useful given that 80% of reference inquiries are for family history. PhotoSearch enables people to search the bank of digitised photographs.

10 11

The BTH name index enables certain people to find records in which appear the names of people and places relevant to Indigenous people. It is a service that responds to recommendations of the National Inquiry into the Separation of Aboriginal and Torres Strait Islander Children from their Families, and aims to help Indigenous people find information about themselves, their families and their country. National Archives of Australia 3

Catherine Styles

Push for pull

Each one serves a particular purpose, and makes it easier to find parts of the collection. As modular solutions to the difficulty of finding particular kinds of records, they work. But they dont operate to improve findability of the collection as a whole. Promisingly, this years Ian Maclean Award has been granted for a project to explore the possibilities of developing a visual map of the whole National Archives collection so that people can browse to series of interest. Being able to browse rather than search would be of immense benefit to users who know little of the holdings or the CRS system and indeed, could work to teach users more of the records context as they browse. Push At the pushy end of the findability spectrum, National Archives has another range of activities. Our most diffuse approach is to select records that are of broad potential interest, and feature them, for example, as a Find of the Month or a Pic of the Week. We have 250 fact sheets and 30 research guides; an evidence-based, biographical narrative site Uncommon Lives, and two authoritative subject-based sites Australias Prime Ministers; and Documenting a Democracy. Finally, we have Vrroom, an alternative database of discrete archival records for teachers and students. Wherever possible, when we feature a record we add a link to its RecordSearch item page, so readers can see the records context and relationships to other records in the archive. These are all effective approaches to pushing people to find archives. But they are not without their limitations. In every case, the content is selective and resourceintensive to develop and maintain. A more significant shortcoming is that the intellectual work that goes into producing these resources is not fully exploited. We need to be able to re-use descriptive data. Funnelling description for findability Many staff of the National Archives read archives in the course of their work and create and extend descriptive data about records. 12 Archives are read for arrangement and description purposes, but also for publications and programs. Historians, curators, educators and editors all read records carefully. Different readers notice different aspects of the records form and content, and depending on the purpose of their research, they craft descriptions, titles, captions and keywords, as well as situating the records in the broader narrative they are working to construct. 13 All of this intellectual work is of potential value to other users of the archives in finding and interpreting the records. But its value cannot currently be fully realised, because those rich descriptions are generally used only once, for the exhibition or publication for which they are written. Such data is also stored in National Archives

12 There may be a significant difference between descriptive (contextual) and interpretive (responsive) text, but for the purposes of this paper, it matters little, because both can be useful in the sense of making the records more findable. 13

Reference officers routinely discover and describe records in the course of assisting public researchers, and they have developed their own system for reusing their work an everexpanding subject index to the collection, which is also isolated from other business systems. National Archives of Australia 4

Catherine Styles

Push for pull

business systems. But some of its utility lies dormant, since neither the description itself, nor an indication of its existence, is captured into RecordSearch. For public researchers, it may as well not exist. Beyond the staff of the National Archives, public readers use archives for their own purpose be it family history, academic history, a school assignment, a blog post, an artwork, whatever. And collectively, the public may have a better understanding of the meaning and value of archives than any one archivist or historian, however expert. 14 If it is worth reusing staff descriptions of archival records, it is also worth tapping into the rich resource of public descriptions. For descriptive data to be fully reusable, it needs to be integrated into archival systems for finding records. In that way, use generates description, which improves findability, which in turn promotes use. 15

14

Evidence of the wisdom of crowds is mounting in the context of archival description, as elsewhere. To cite a prominent example, the Library of Congress project to acquire descriptive information about thousands of photographs via the Flickr Commons is a stunning success. In a recent story in The Boston Globe, Stephen Mihm reports on crowdsourcing as an emerging phenomenon for generating accurate historical information (Mihm, 2008). See also the blog posts announcing the Library of CongressFlickr Commons project on the Library of Congress blog (Raymond, 2008) and the Flickr blog (Oates, 2008).

15 The idea that user description can become a form of archival description is consistent with the notion of the records continuum and arguably, the CRS system itself. In The Records Continuum, which she edited with Michael Piggott in 1994, Sue McKemmish describes how as an emerging archival system, the CRS system went beyond description for control, and incorporated information required for subsequent action, such as administering access (190). Her point that the object of description changed with the CRS system is also pertinent here: The object of description ceased to be the creation of a surrogate (word photograph) of the physical grouping of records in the repository. It became instead the creation of knowledge representations (191, my emphasis). The notion of the post-custodial revolution also suggests that archival description should continue to grow from the point of creation through records contemporary use. Like archival description, researchers and other users descriptions provide additional information about a record not about its prior existence, but about its present. The idea of using researchers annotations to enhance archival finding aids already has a history. Michelle Light and Tom Hyrys paper Colophons and annotations (which cited various projects in train at the time) was published in 2002. Light and Hyry argue that by including only the archivists description that is, by excluding all subsequent interpretations of records the traditional finding aid privileges the first reading of a collection, arresting its evolution at a particular moment in time (226), and that annotations could promote discovery by augmenting the existing form of access (228).

National Archives of Australia

Catherine Styles

Push for pull

In the next part of the paper I flesh out this notion by tracing a historical record from creation through contemporary use to archival description and findability and onward, into its modern-day use, noting how each use of the record potentially makes it more findable. Using the following diagram for structure, I begin in the centre, at the point of a records creation.

National Archives of Australia

Catherine Styles

Push for pull

Record creation and contemporary use

NAA: A659, 1939/1/16561, p. 103 National Archives of Australia 7

Catherine Styles

Push for pull

In February 1934, Victor Fitzgibbon wrote a note to the Secretary of the Department of the Interior. The Department had given him four weeks work so he could leave Canberra with his family. In that time he had saved enough to buy and recondition a truck, and he now sought a grant to register the vehicle for three months. CS Daley, Assistant Secretary, put the handwritten note into a typewriter to make his recommendation. He approved the grant as a debit to the Alleviation of Distress, on the grounds that Mr Fitzgibbons continued residence in Canberra would be a greater burden to the Alleviation of Distress than the amount requested. The Secretary must have been away, because Daley adds a handwritten note: In view of urgency, take action as proposed and resubmit for covering approval on Secretarys return. Another annotation suggests that the grant was issued four days later, and HC Brown, Secretary of the Department, noted his approval about a week after that. In this first phase of the records life, it has served its purpose as attestation to the need for the grant; and as documentation of the Assistant Secretarys recommendation for approval, and on what grounds; of the funds disbursal; and of the belated approval for such. If you read the other documents in the file that relate to Mr Fitzgibbon, you can get a fuller picture. It was the tail end of the Depression. Victor Fitzgibbon had arrived in Canberra after 1929, so he was ineligible for the rations available to other residents in similarly difficult circumstances. He was living at Ainslie married camp, with his pregnant wife and infant child. Several months before writing this letter, he had agreed to leave the Territory by mid-January if he was unable to find work. From the Departments point of view, the Fitzgibbon family had received special treatment, on account of the young child and Mrs Fitzgibbons pregnancy. In fact, one document notes that in 1929, Victor Fitzgibbons father had been granted a ticket to Melbourne, only to return unannounced with Victor and his family. Probably, the Department was keen to see the back of the Fitzgibbons, its sympathy having expired. The final instalment in the archival story is a small note pinned to the letter. The Assistant Secretary states Has Fitzgerald [sic] actually left on the vehicle. Another hand has written Please verify from police. A final note states Fitzgibbon left Canberra Thursday last 22.2.34 destination unknown. Archival description and findability This document was placed in a file labelled Ration relief. Travelling unemployed. The catalogued and now digitised record maintains that title. It also now bears the reference number A659, 1939/1/16561, which places it in series A659, Correspondence files, class 1 (general, passports) of the Department of the Interior. I could have found the file by searching for unemployed and 1934, but I could not have found it by searching for the Depression, even if I had searched at the level of series, because unlike some series notes the notes for A659 dont mention the Depression. Modern use and description To discuss modern-day uses of this record, Ill start with the obvious example my own. I am using this record to illustrate how records are used, described and made findable. This record could also be useful as evidence of the experience of the Depression, for descendants of the Fitzgibbons researching their family history, or

National Archives of Australia

Catherine Styles

Push for pull

for historians researching early Canberra. In the course of introducing this record, I generated 411 words worth of descriptive data. Every use of a record is likely to generate description, as mine did, and since web applications make it so easy to selfpublish, those uses are likely to result in the data being available online. There are two distinct ways in which that descriptive data could function to improve findability of the item from which this record comes. Again, we can consider them in terms of push and pull push being the marketing approach, of distributing the description, and pull being the systems engineering approach, of harvesting descriptions in the service of findability. New forms of findability Having crafted those words, I might have shared them in various ways, or for various purposes, each of which can be seen to push people into viewing the record. I could use email, a blog tool, Facebook or any other medium to generate further interest. There are a heap of ways people can use an archival record without ever visiting the naa web domain. Social media is rich with possibilities here, and National Archives is beginning to make use of it. For example, we publish some material on Flickr and then embed it in our own site. 16 Also, we are working on a feature that uses Google Maps to enable people to browse to World War I service records by place of birth or enlistment and we plan to use the digital scrapbook tool, Tumblr, so that people can leave a note or a photograph about a service person. Clearly, the opportunities to push for findability are abundant in the current web environment. What about pull? How can multiple descriptions make it easier for people to find records for themselves? Better findability in RecordSearch What would happen if my description of the Depression record was funnelled into RecordSearch, and became searchable along with the archival data? As a string of words all equally searchable, those paragraphs may not be of optimal value in a search for relevant records. If all such descriptions were deployed in a keyword search, the user would retrieve a greater number of results, but would they retrieve all the relevant results? And would all the results they retrieved be relevant? Maybe; maybe not. 17 But if we were to extract useful concepts from that string of words, and arrange them into categories that would help to improve the aim of a search, the findability gain might be significant.

16 Indeed, this initiative has significantly improved findability of those records. Most of the traffic in fact comes from within Flickr. 17

As findability expert Peter Morville notes, information retrieval depends on both recall (all relevant results) and relevance (precision, only relevant results). Full-text searching optimises findability where the data set is small, but the larger the data set, the lower its success-rate. (Morville, 2005:4952) National Archives of Australia 9

Catherine Styles

Push for pull

Places

People

Gov. bodies

Format

Other concepts

Canberra Melbourne Ainslie

Victor Fitzgibbon CS Daley HC Brown

Civic Branch Department of the Interior

handwritten residence annotation typewriter camp married, infant, pregnant truck, register, vehicle, transport Alleviation of Distress grant, funds rations difficult 1934 Depression

Theres a lot of potential here, to combine human and computer-generated metadata. In fact, the Powerhouse Museum is already testing and refining such a technique, using their curator-crafted descriptions. 18 Clearly, there are cultural and technical issues to address, and Ill touch on those shortly. But the point I want to make here is one of principle. User-generated descriptions are an excellent raw resource for making records more findable through RecordSearch even, I would argue, if some of them are off the mark, or superficial, or wrong. The point is, it would benefit the National Archives and our public users to explore ways to co-opt these descriptions to put them to work in the service of greater findability. In other words, lets devise ways of using push for pull. More use and more value, and so on Findable records are usable records. If records were to become more findable via diverse channels of social networks and via RecordSearch, they would also be used by more people, in more ways, some of which we cannot anticipate. What we can reasonably anticipate is that every engagement with the records potentially enriches the records. In short, there is an interdependent relationship between engagement and enrichment, users and archives.

The Powerhouse Museum has a processor for auto-generating tags operating in beta mode in the online collection database, OPAC. It extracts terms from existing curatorial descriptions and converts the information to RDF. For a good example, see: http://www.powerhousemuseum.com/collection/database/?irn=348799. BBC RadioLabs has also developed a prototype tool using Wikipedia and Lucene to generate tags to make individual broadcasts more findable (Sizemore, 2008).
18

National Archives of Australia

10

Catherine Styles

Push for pull

Archives engage users; users enrich archives, in a perpetual, mutually beneficial, generative cycle. For me, as a web professional, what this means is that the future of archives is in the hands of their users, and we should find ways to allow that cycle to happen. In my view, the more momentum this cycle gains, the more the National Archives as an institution will flourish. On the technical front, staff of the National Archives are beginning to consider ways to take advantage of this cycle. 19 One issue that may be of concern to archivists is that descriptions, like records themselves, need context. If you come upon someones description, youll wonder what is its status? Who wrote it, and for what purpose? In other words, there needs to be a mechanism by which user-generated description can be authorised. Certainly, thats something to consider in the design of any system that harnesses user descriptions for findability. And its something Im really interested in, because in the digital age, the process of authorisation of accruing authority is changing. 20 But issues of systems design aside, the first hurdle for this kind of proposal and possibly the largest is a cultural one. It depends on being open to collaborating with communities of users, many of whom are not archivists or historians. In particular, such collaborations depend on allowing communications through the network to pass freely letting go of the urge to stand as gatekeeper. 21

Social software, the semantic web project, and open source software offer many opportunities on this front. Resources for implementing such ideas can be minimal, as Mike Ellis and Brian Kelly contend in a paper on overcoming organisational resistance to Web 2.0: small-scale solutions can, and should, be rolled out very easily. Benefits can be measured and fed back quickly, and used as input into a virtuous cycle of support for these technologies. This is Rapid Application Design (RAD) for the web: build it, test it, amend it, then rinse and repeat... (Ellis and Kelly, 2007).
19

In his most recent book, Everything is Miscellaneous: The Power of the New Digital Disorder, David Weinberger uses the example of Wikipedia to make this point, describing how it is through the effort of avoiding claiming any authority that Wikipedia becomes authoritative. Readers are in every sense expected to be active, and assess each article on its merits (Weinberger, 2007:142).
20 21

Only then can the immense value of online access be realised, as Kenneth Hamma, Executive Director for Digital Policy and Initiatives at the Getty Institute, warns: Social value realised by the investment in being digital increases exponentially in the network but only if we let the network operate unimpeded. [...] Every time we insert our desire to be gatekeeper that habit weve learned very well with physical resources we disrupt the network, we diminish the return we might realise on the investment in being digital. (Hamma, 2006:15) National Archives of Australia 11

Catherine Styles

Push for pull

If we can do that, the burden of making records accessible will no longer be the National Archives alone. It can be shared with peer-to-peer networks of committed users that already exist and that would develop along with the architecture and culture of interaction. 22 In our existing approach to promoting online access to the collection, we generate descriptive data that is used once but difficult to reuse and which remains disconnected from the source records. We may notice the descriptive work of public users; but we can neither consistently track it, nor put it to work. The proposal in this paper is to allow the descriptive activity of staff and the public to enrich the original archives, and in the process make them more findable and in turn used. If we can recognise and begin to work with this cycle of user engagement and archival enrichment, by forging connections between user descriptions and the source records, we can simultaneously draw on and feed back into the power of that cycle. By developing a mutually beneficial relationship with our reading public, we could also begin to transform into a new kind of institution one that is less like a gatekeeper, and more like a host, of an architecture and culture of interaction with the archives. 23 Findability will always require multiple approaches. The proposal in this paper will not solve the issue once and for all, but this kind of strategic shift would sustain and increase the use, relevance and value of national archives into the future. It is worth exploring.
2008 National Archives of Australia. Dr Catherine Styles is Managing Editor, Websites, at the National Archives of Australia. This paper was first presented to the Australian Society of Archivists annual conference in Perth, 9 August 2008.

Networks of committed users supply a lot of energy to the descriptive project, and they do so without great incentives. If we make it an easy, gratifying experience for users to enrich archives, they will help just because they want to contribute to a greater good. This point is borne out in both the smaller-scale projects of the National Archives to engage groups of volunteers to help describe particular series of records, and on a larger, technologicallyenhanced scale, by smart mobs that work in concert even if they dont know each other. Howard Rheingold writes, the time is right to combine conscious cooperation, the fun kind, with the unconscious reciprocal altruism that is rooted in our genes (Rheingold, 2002:212).
22

In The future of learning institutions in a digital age, Cathy Davidson and David Theo Goldberg, co-founders of HASTAC (Humanities, Arts, Science & Technology Advanced Collaboratory), redefine institutions as mobilising networks. Such a definition is apt here.
23

National Archives of Australia

12

Catherine Styles

Push for pull

References Chan, Sebastian. 2008. User experience is all that matters: A reminder about content, search and users. fresh + new(er).
http://www.powerhousemuseum.com/dmsblog/index.php/2008/06/06/user-experience-is-all-thatmatters-a-reminder-about-content-search-and-users/

Davidson, Cathy, and David Theo Goldberg. 2007. The Future of Learning Institutions in a Digital Age.
http://www.futureofthebook.org/HASTAC/learningreport/about/

Ellis, Mike, and Brian Kelly. 2007. Web 2.0: How to stop thinking and start doing: Addressing organisational barriers. Presentation at the Museums & the Web conference. San Francisco.
http://www.archimuse.com/mw2007/papers/ellis/ellis.html

Hamma, Kenneth. 2006. Investing for the public. Presentation at the Museum Computer Network conference. Pasadena.
http://www.mcn.edu/conferences/index.asp?subkey=1230

Karp, Scott. 2008. If your users fail, your website fails, regardless of intent or design. Publishing 2.0.
http://publishing2.com/2008/06/05/if-your-users-fail-your-website-fails-regardless-of-intent-or-design/

Light, Michelle, and Tom Hyry. 2002. Colophons and annotations: New directions for the finding aid. The American Archivist 65, no. 2 (Fall/Winter): 21630. McKemmish, Sue. 1994. Are records ever actual? In The Records Continuum, 187203. Clayton, Victoria: Ancora Press in Association with Australian Archives (now National Archives of Australia). Mihm, Stephen. 2008. Everyone's a historian now. The Boston Globe.
http://www.boston.com/bostonglobe/ideas/articles/2008/05/25/everyones_a_historian_now/

Morville, Peter. 2005. Ambient Findability. Sebastopol: O'Reilly Media. Oates, George. 2008. Many hands make light work. flickr blog.
http://blog.flickr.net/en/2008/01/16/many-hands-make-light-work/

Raymond, Matt. 2008. My friend flickr: A match made in photo heaven. Library of Congress blog.
http://www.loc.gov/blog/?p=233

Rheingold, Howard. 2002. Smart Mobs: The Next Social Revolution. Cambridge: Basic Books. Sizemore, Chris. 2008. Wikipedia + Lucene's MoreLikeThis = useful bits about the bits?. BBC RadioLabs. June 13.
http://www.bbc.co.uk/blogs/radiolabs/2008/06/wikipedia_plus_lucene_morelikethis.shtml

Weinberger, David. 2007. Everything is Miscellaneous: The Power of the New Digital Disorder. New York: Times Books.

National Archives of Australia

13

You might also like