You are on page 1of 9

Kevin Champion of Team Serendipity 1

Recommender System Application

University Library Materials

This report explores the design of a system to make recommendations of university


library materials. These materials will be heavily composed of books, but the idea will be
flexible enough to also handle journal articles, videos, and perhaps even web content. The
primary notion is that the library is ordinarily thought of as a repository of materials. This,
however, is really just its first order service. A second order service that a university library is
poised to provide is embedded in the expertise and familiarity with the materials of the
librarians and faculty at the university. With this in mind, a recommender system could be
developed to utilize this rich set of knowledge to curate subsets of the overall library
collections, which could then be used to make recommendations to users. A large number of
these subsets from across the university could be interconnected and used to surface new
content to users, enhance their experience, and break down artificial barriers created by
different subject areas.

Domain Characteristics

The domain of university library resources is an interesting area to examine because


the traditional focus of libraries is to provide access to the widest and most all-
encompassing set of materials. However, this is far from its only function. Indeed a valid
argument can be made that simply purchasing and making available materials is only one
part of the process of making these materials truly accessible to users. A more holistic
perspective of providing access to materials requires that additional effort be placed on how
the user is able to discover and interface with these materials. In this vein, accessibility is a
much larger concept. For instance, just because a user is able to retrieve a particular book
from the libraries' collections without being charged a fee, does not ensure that the user will
actually be able to discover that this book exists, know that the university libraries' owns it,
and be able to find and retrieve the actual physical book. Over time libraries have put
considerable work into providing catalogs that are easy to search along with finding aids that
help users check to see if their library system contains the resources they are interested in
(ie. Get It). Nonetheless, in almost all instances, users are still required to have a very good
idea of what they are seeking before interfacing with the libraries' tools. This is to say that
libraries have yet to discover meaningful and truly effective ways of helping users discover
materials when they do not know what they need; in effect there are yet few instances of
tools that help users browse materials as they might the physical shelves of a bookstore.
Browsing is an important function that libraries and library users would benefit from in
an online environment. Browsing enables serendipitous discoveries whereby users are able
to find resources that they did not know they needed. In essence, browsing provides the
Kevin Champion of Team Serendipity 2

antithetical experience to searching; users are not required to have a good idea of what they
are looking for before engaging in the browsing experience. Engineering this sort of
serendipitous discovery relies upon providing a highly relevant experience to users in much
the same way that search depends upon relevance. One way that libraries have traditionally
attempted to tackle the issue of relevance is through classification via the Library of
Congress system, among others, and the use of metadata about the materials. A sort of
pseudo-browsing experience is often enabled in library search systems whereby users are
able to engage in faceted browsing once an initial search has been made. This functions
much like many popular and highly usable online shopping sites whereby results can be
narrowed and expanded by adding or removing categories or other metadata descriptors.
While this is a proven method for interfacing with search results listings, it does not address
the scenario whereby the user does not know what to initially search and it does not utilize
the expertise embedded in the collective “mindspace” of the university's librarians and
faculty.
One method to create a truer browsing experience and utilize the expertise in the
university that deserves exploration is the use of a recommendation framework to generate
highly relevant resources for users. One activity that both professors and librarians are
already doing is creating lists of resources tailored to specific classes, topic areas, or
disciplines. Professors do this every time they develop a reading list for a particular course
whereas librarians do this when working with specific courses to help students with their
research, when creating subject guides, and when curating targeted physical collections.
Each of these instances of lists are carefully curated by professionals who are experts in
their domain. As such, an item simply existing in one of these lists is different from any other
item in the libraries' collections that does not show up in one of these lists. Furthermore, an
item that shows up in more than one list is different from an item that occurs in only one list.
Due to this, an occurrence of an item in one of these lists can be thought of as a weighted
vote for that item. Using this frame of thought, a recommendation system can be developed
to take into account these weighted votes. The resulting recommendations can be presented
to users in an interface tailored to browsing and focusing on connecting highly relevant
materials to each other.
Academic materials in the form of books and research papers have specific
characteristics that make working with them different than working with popular books,
commercial products, or other types of more ordinary recommender domains such as
movies or restaurants. The most prescient of these differences is the shear quantity of
academic resources available. When attempting to find materials on a broad topic, users will
most often run into problems of scale in which there are multiple orders of magnitude greater
number or resources in existence in a particular domain than the user needs or can process.
In addition, while most all of these materials have metadata, the metadata is often less useful
than with more popular and commercial areas.
The issue of ineffectiveness of metadata is caused by a number of reasons, but one of
them is the highly specified language used to describe resources in different subject areas.
Over time, academia has developed an ever broadening set of fields and disciplines as
scientific methodology has necessitated ever increasing levels of specialization. This creates
a branching effect, which allows for specialization but also can tend to separate disciplines
Kevin Champion of Team Serendipity 3

from each other. Since each discipline develops its own language to describe itself, highly
related disciplines that end up on different branches sometimes lose connection to each
other. However, reality dictates that a more appropriate metaphor is a web and that even
though academia has created branches that effectively put fields into their own silos, they
often share many characteristics and ideas in common. One way of connecting these
branches once more is to build a web by mapping resources. However, instead of using
metadata to do this, there is an opportunity to use co-occurrence in lists to draw
connections between distinct resources. This idea is not dissimilar to the groundbreaking
“PageRank” algorithm developed by Google to weight webpages by the number of links
from other webpages. In fact, many of the characteristics specific to academic resources are
also found in the characteristics of webpages (scale, ineffective metadata). As a result, if we
think of Google's results as an explicit form of recommendation engine, it is not a stretch to
the applicability of a recommender system to this idea of linking together academic
resources.

Design Dimensions

Note on privacy

Public and academic libraries place supreme importance on the privacy of their users.
As a result, they do a number of things to ensure that users maintain privacy when using
library resources and services. One of the most important mechanisms that libraries employ
for maintaining privacy is to simply not track and store user behavior. What this means is
that libraries intentionally purge information about what resources a particular user has
checked out or viewed in the past. This makes it so that libraries are not put in an ethically
compromised position if the government comes to them asking for information about a
particular user; if they do not have any information they do not have to break this user's
privacy.
This policy has implications for developing user interfaces and experiences. Since the
libraries do not store this information, they cannot create an interface that allows a user to
login to her account and view her previous checkout history, for instance. It also has
implications for the type of recommendation systems that can be built for library materials.
Since user-data is intentionally purged, user-user algorithms and collaborative filtering
techniques are not possible because we have to presume an environment where we do not
have user specific information. Users are unable to rate resources and the library does not
have a way of tracking browsing behavior to an individual user over time. That said, there are
some libraries who are developing opt-in systems to enable some of these features if users
agree to the privacy implications. Nonetheless, this paper attempts to outline a system that
can be effective using item-item algorithms and content filtering approaches in the university
context without the need for user specific data.
Kevin Champion of Team Serendipity 4

Content-based recommending

One way of building a recommender for library resources is to utilize the reading and
resource lists created by professors and librarians. In terms of the recommender, each list
can be considered similarly to a user and each item in a list can be considered a vote for that
item. Since these lists are curated by domain experts, we can operate with confidence that
by considering items' existence in a list as a vote the recommender will be working with
resources that have both a high degree of quality and relevance. Therefore, this technique
will result in a matrix of lists and items where the lists are along one axis and the items are
along the other. Using this matrix, a simple item-item recommender algorithm can be used
to recommend related resources to the current resource being viewed. Additionally, since
there will be a much larger number or resources than there will be lists and since there will
be a relatively small amount of overlap from resources that are listed more than once (ie.
resources that have more than one vote), this matrix will work best with algorithms that deal
well with sparse ratings matrices. Consequently, an SVD algorithm can be used here to
discover features of the items based on their votes and make recommendations based on
these features.
Along with this way of counting instances of resources in lists as a vote, other
content-based techniques can be used in other algorithms to develop interesting
recommendations. There are a number of useful pieces of metadata that each resource is
likely to have. Most of these forms of metadata will be useful for recommending similar
items, but the following will be most effective: list metadata that details the course the list is
used for and the subject area/s the list items deal with, item subject data derived from the
Library of Congress subject headings, full-text descriptions of items derived from abstracts
and summary paragraphs, and additional subjects or “tags” applied to the items by the
professors and librarians when they list them. Of these four types of metadata, three of them
are essentially keywords that are used to categorize the items into some sort of taxonomy.
As such, these can all be grouped together and used to create a content keyword frequency
matrix, which can then be fed into a content-filtering SVD algorithm. The other type of full-
text metadata descriptions can be used by first running them through a natural language
processing machine in order to derive keywords from the descriptions. However, the
resultant keywords should probably not be combined with the categorical keywords
contained in the other metadata because they come from an uncontrolled vocabulary and
are not used for taxonomical purposes. All of the other metadata come from controlled
vocabularies and will thus result in a more concentrated frequency matrix. Adding the natural
language keywords would pollute this concentration rendering this keyword matrix less
effective. Instead, the abstract keywords can construct their own content keyword frequency
matrix which can be fed into another SVD algorithm to generate similarities.
It must also be mentioned that by thinking of lists as users and the existence of items
in lists as votes for those items, it is possible to utilize a pseudo-user-user algorithm. In this
case the user-user algorithm might be better described as a list-list algorithm. By running the
vote matrix into a list-list algorithm, it would be possible to generate recommendations of
other lists similar to the current list being viewed. The simple user-user algorithm would
Kevin Champion of Team Serendipity 5

discover these similar lists by finding lists that had the greatest number of resources co-
listed in each. Since we have already discussed that this sort of overlap will be sparse (even
though it will exist), an SVD algorithm would be more effective in this instance as well
because it could discover relationships in a more relationally complex way, which will
hopefully result in more relevant recommendations of similar lists.

Design recommendations

In order to create the most useful recommendation system, I recommend the use of
most of the algorithmic content-based techniques mentioned above. In this vein, I think a
hybrid system should be used to help surface related resources to users. When a user views
a particular item in a particular list, the recommendation system will employ a number of
techniques to feed recommendations into the interface.
The matrix of “votes” created from resources existing in lists will be fed into a SVD
algorithm using ten features (in the optimization process the number of features will be
tweaked to get optimal results). The content keyword frequency matrix derived from subject
classifications will also be fed into an SVD algorithm. Using the output features and
weightings of each SVD calculation, a weighted feature combination technique will be
employed to join the features of the two content-based SVDs. This combinatorial approach
is a hybrid itself of the “weighted” (Burke, 2002, p. 339) and “feature combination” (Burke,
2002, p. 341) hybridization techniques and will work by first artificially inflating the weightings
of the list-based SVD to give its results primacy and then will combine the features so that
one set of recommendations is output. This approach of combining the SVDs will allow the
resources' existence in lists to reveal relationships, but will also utilize the inherent
taxonomic connections between resources that have been described by classification
experts.
In addition to these two algorithms, a third will be run on the content keyword
frequency matrix created from the full-text abstracts of each resource. This matrix will be fed
into its own SVD and the weightings and features from it will be combined with the hybrid
results which have already been combined. It will do this using one of two techniques
depending on the interface to be employed: weighted combination or “mixed” combination
(Burke, 2002, p. 341). If the desired interface requires only one set of recommendations then
a weighted combination will occur whereby the full-text recommendations will be weighted
as less important than the combined vote and subject keyword based recommendations.
This is the case because the full-text derived recommendations do not take into account the
professors' and librarians' expertise, which is a key element missing in current systems that
this paper proposes will lead to better recommendations. In interfaces which can
accommodate a more complex display, the full-text recommendations will be displayed in a
separate location alongside the other recommendations using a “mixed” strategy.
Lastly, a third type of recommendation will be used to recommend other lists similar to
the current one being viewed. For this set of recommendations the original matrix of
resources and lists will be sent to an SVD to discover features of the lists, which will lead to
Kevin Champion of Team Serendipity 6

recommendations for other lists. These recommendations will be displayed in the interface
apart from all the other recommendations as they are distinct.

Performance

Performance in this system is not likely to be a problem because almost all of the
computation can happen offline before recommendations will be used. Since this system
does not have to deal directly with user input via ratings or other usage metrics, it has little
need to update in real-time. The main event that would require that the algorithms be run is if
a list was added to or edited in the system. Since this will happen only semi-frequently, most
computation can happen offline without negatively impacting the user-experience.

Interface

This recommendation system is geared primarily at developing a highly relevant and


useful information architecture for academic library resources. As a result, the end goal of
the recommender engine is not as simple as outputting a list of items that can be put into a
widget-like box somewhere on an already existing library page. In fact, this system requires
an entire web framework to be built, and requires that the recommendations from the system
be tightly integrated with this framework in a usable interface.
A framework needs to be developed to house the lists that librarians and professors
create. Each list will be housed at its own unique URL that librarians and professors could
share with their users. Lists themselves will be a part of a larger system that houses all of the
lists. Each lists will be characterized by a highly visual, fast, and interactive interface that
encourages clicking and browsing. APIs will be used to pull in book cover and other images
to represent each resource visually, and Javascript will be used heavily to utilize the
processing power of the browser to ensure a highly responsive level of interaction. When an
individual resource is selected, a light-box will open displaying more information about that
resource and the recommendations from the algorithms to other resources. These
recommendations will be displayed equally visually by offering an image of the resource
along with a title and perhaps an indication of what list/s it exists in. Recommendations of
other lists will display in a sidebar of each main list page. Due to its visual nature, there will
be very few textual descriptions to detail why these recommended resources are being
presented to the user. Instead, the interface will leave these details vague in the belief that
users need not concern themselves with the specifics if they are finding interesting
resources. While there are many alternatives to the specifics of how this user-interface could
be developed, ensuring that it is highly visual and fast to interact with will be key pillars of its
success.
In addition to the actual interface of lists of resources, this system can be integrated
with the main library catalog. When a user enters an item in the main catalog, an interface
element can be added to the information about that item, which shows and links to the lists it
Kevin Champion of Team Serendipity 7

has been listed in. This type of element has been pioneered in the commercial sector with
features like Amazon's Listmania. In it, shoppers create lists of products and then when
navigating to a product that has been listed, an element is added to the item which displays
the lists it belongs to and related items to it from those lists.

Drawbacks and pitfalls

While this recommendation system and user-interface is conceptually sound, it is not


practically implementable given the current state of most universities. In this section I will
predominantly relate the situation at the University of Michigan, and make the broad
assumption that it is characteristic of most universities. Even though the university has a
huge amount of knowledge embedded in the course reading lists from professors, these lists
are not made available to the public with any consistency. There are Open institutes such as
Open.Michigan that attempt to release university courseware under an open license, but
even these efforts are not sufficient for the type of system envisioned here. Even presuming
that all courseware at the university could be published openly, this system would still
require consistently formatted resources and machine parse-able reading lists. Given that
the university is highly decentralized, the sort of standardization that would be required to
accomplish this is not likely to occur easily. It is more conceivable that librarians could
organize around a standardized way of developing and formatting these types of lists, but
without the lists from professors' courses, the system would not contain a sufficient amount
of resources to lead to useful recommendations.
In addition to issues with the practical prerequisites of this system, it would be quite
challenging to get the user-interface right so that this tool was a solution instead of just
another set of webpages that add to the already complicated and confusing library web
profile. A big part of developing the user-interface would also involve developing an
administrative interface that made it easy for librarians and professors to create and manage
lists. This would be of prime importance because most of these professionals would not be
technically knowledgeable and most of them would not use such a system if it required a
great deal of effort and difficulty to setup. So, in addition to the end user interface, the
system would have to get the administrative interaction right so that it was simple and
pleasant to use for librarians and professors. That said, if the administrative interface was
successful it could be used as an opportunity to add additional information to the system to
make it more effective. For instance, as mentioned above, one possibility would be to allow
librarians and professors to add keywords or tags to resources as they are creating the lists,
which would presumably make recommendations even more effective.

Future possibilities

Along with the recommendations already mentioned, if this system were put in place
and were successful, there would be a lot of opportunity to expand the system by
Kevin Champion of Team Serendipity 8

implementing user-input based recommendations. This could conceivably be accomplished


by offering users an opt-in system in which they were able to choose to allow the library to
store certain usage and user-input data. If this were in place, the library could track browsing
behavior and solicit user-inputted profile information, which could be used to create a
keyword/subject profile of the user. This profile could then be used to help weight the
recommendation engines so that items would be more relevant to that user's profile. In
addition to these mechanisms, actual user-ratings, reviews, and even user contributed
tagging could be added to the system if it developed a critical mass of usage. Individual
items could solicit ratings from users, which could then be ran through additional algorithms
to modify the existing recommendations or create new ones.

Note on sources

This paper was developed in concert with a project I am doing to submit to the
iDesign competition for the University of Michigan Libraries. Due to this, much of the domain
specific information and knowledge within was ascertained from a series of interviews and
discussions with University of Michigan librarians and library staff, along with staff of the
Open.Michigan program. Also of note is that this paper outlines a number of aspects of the
actual design I will be submitting for the iDesign competition.
Kevin Champion of Team Serendipity 9

References

Burke, R. (2002). Hybrid recommender systems: Survey and experiments. User Modeling
and User-Adapted Interaction, 12(4), 331.

You might also like