Evaluation of JSTOR DFR: Usability Testing

Evaluation of JSTOR DFR1:
Usability Testing
Team 622: Morgan Burton, Isabela Carvalho, Tze-Hsiang (Stan) Lin, Lei
Shi
April 12, 2010
Word count: 4,000
1
JSTOR DFR (Data for Research): An online researcher-oriented
exploration tool provided by JSTOR.
Executive Summary
Our team worked with the Data for Research (DFR) development team
in conjunction with JSTOR to analyze and evaluate the usability of DFR
interface. The DFR system provides a multifaceted web-based search
interface to help users create, refine, and download datasets of
metadata, word frequency counts, citations, n-grams and key terms,
with the addition of visualization facilities for users to explore and
analyze. Our team conducts a series of assessments on the DfR to
develop practical recommendations to help the DfR meet the needs of
potential users worldwide.
Following our internal heuristic evaluation, we conducted a series of
usability testing sessions consisting of 5 participants. The sessions
consisted of assessing participants’ initial reaction to the DfR site,
performing a series of 5 tasks using the DfR tool, and then gathering
reflective comments and suggestions after becoming familiarized and
working with the system.
Participants generally liked using the DfR for research, though they
encountered frustrations related primarily to organization of result
information and navigation, verifying the heuristic issues our team
found previously. Our findings and recommendations included the
following:
1) Users did not understand that different sections of the site were
interrelated, connected through the search.
2) Under the results list page, it is unclear what search query the
results are about.
3) Users fail to understand the aggregation mechanics of the
system –that as you add more search queries, it adds unto your
previous search rather than starting a new search
4) The purpose and description of the Data for Research tool are
fairly intuitive, contrary to conclusions derived from previous research
data.
5) It is unclear what some of the graphs on the main page are about
For each of these findings we developed specific
recommendations, entailing layout changes that enhance clarity
of system status.
Introduction
JSTOR (short for Journal STORage), founded in 1995, is a non–profit
organization that provides a online system for archiving academic
journals and other academic content. It is dedicated to helping
scholars, researchers, and students discover, use, and build upon a
wide range of scholarly content in a trusted digital archive (JSTOR,
2010). Until October 1, 2009, more than 5 million articles in more
than 1,000 journals have been available online through the JSTOR
system (JSTOR Facts, 2009).
DfR (Data for Research) is a web-based service provided by JSTOR that
enables users to visualize JSTOR data and to create, refine and
download datasets of metadata, word frequency counts, citations, n-
grams and key terms in a specified period, journal, discipline, etc
(JSTOR DfR, 2010). Content from the JSTOR archive can be
manipulated by using the multifaceted online search interface or a
RESTful-based API to meet the researcher's needs. The retrieved data
sets are then visualized according to several different facilities and
possibly refined before being downloaded for offline use.
The interface provides an iterative process of narrowing the default
results set (all documents from the entire JSTOR archive) to results
containing only documents of interest according to numerous
selectors. These selections can be either the query terms users typed
in the search box or document attributes (facets) pre-provided by the
system in the left sidebar. Once users enter search terms and/or
select the facet values, the result set will be refined and several
visualized or retrieval views of the new data set will be updated
dynamically in the content pane (the right two-thirds of the page). A
tabbed options bar is provided (at the top of the content frame) for
users to choose different methods of representing results, such as the
“Summary” tab provides diagrams to summarize the results set; the
“Results List” tab lists all the results with links to detailed descriptions
of all available data formats; the “Key Terms” tab creates a tag cloud
contains the terms extracted from all documents in the result set
automatically; the “References Profile” tab uses charts to depict the
average number and age of references per article by year for all
documents in a search result set (Burns, J., 2009).
Usability testing for the Data for Research (DfR) site involves
evaluation of the system and interface by selected potential users
(based on our previous research and findings) who are not familiar with
DfR. In real time, we can observe how people generally use the site,
evaluate features that they believe are valuable and easy to use, as
well as frustrations and suggestions to optimize the DfR site. In
conducting usability testing on the DfR interface, we wanted to address
the following research questions and issues:
a) What do new users expect from using the site, in terms of
terminology and functionality?
b) Is the DfR interface intuitive enough for new users to correctly
and efficiently use the site without needing much guidance or
reference?
c) How do users understand search results and organization? Does
the DfR interface resound or match with their understanding?
d) What features do users find to be helpful or lacking in using DfR?
Will exposure to new ways of exploring data aid their purposes and
goals?
e) Ultimately, what is the best way the DfR can be adjusted to meet
the needs of more diverse potential users and encourage repeat
usage by all groups?
Methods
Multiple passes evaluating the DfR site against heuristics of usability
exposed a number of issues, many related to navigation and system
feedback. To verify and further evaluate our findings, we conducted
usability testing, which brings in naïve subjects to evaluate the site
without the knowledge of heuristics. The purpose of usability testing is
to contextualize and provide insight into user thought and analysis of
the system in question. For the DfR, it was very important to find users
with the appropriate level of expertise and yet have not used the DfR
tool before to gather purely new first impressions and issues with using
the site. The DfR interface must be fairly intuitive to users, or else they
will have negative impressions of the site and be less likely to return.
Participants
In searching for appropriate participants, we attempted to seek out
people who fit our “casual researcher” persona established from our
previous personal interviews and surveys. We soon realized that the
persona of “casual researchers” may be heavily overlapped with those
who spend a great deal more time in research. Though we had focused
on using the amount of time and intensity spent gathering information
for research to define types of research, defining research levels
according to purpose allowed for more flexibility in finding users. As
they were all naïve to the DfR interface, we assumed they would have
the same issues, no matter what tasks they were attempting.
Through using email and personal contacts, we found 5 subjects to
agree to participate in our usability testing. All were graduate students
at the University of Michigan (2 PhD, 3 Masters), ranging between 24
and 30 years of age. Three were female, two were male, which was
slightly different from the participant makeup in our persona interviews
(four male, 1 female). All used computers at least on a daily basis, and
all but one considered themselves moderate or serious researchers.
Materials/Testing Procedure
After obtaining consent to participate in our research, we set up
specific appointments in one afternoon for users to come test the DfR
interface using two laptops provided by members of our group. The
testing session consisted of 4 phases.
First, subjects were told an introductory script that thanked for
agreeing to participate in the evaluation process. Then, they were
introduced to the DfR WITHOUT being told its purpose or given a
specific description. They were then informed of the following
procedures, and made aware of their voluntary participation, that they
were not being tested, rather the system; and that they could stop at
any time if they felt uncomfortable [see Appendix A: Script].
If they agreed, the group member responsible for recording
information on the logging form would ask the participant for some
general background information, including their age, gender,
occupation, highest education completed, and their subjective level of
research [see Appendix B: Logging Form].
After this, a shorter script describing the following pre-task was read
[see Appendix A: Script]. The subject was then given a pre-task that
involved assessing their first impressions of the DfR site without
actually using the system. Participants were shown the DfR in a web
browser window and asked to answer three open-ended questions [see
Appendix C: Pre-Task Questionnaire].
After completion, participants were then read a longer, more detailed
introduction to the DfR for JSTOR. The testing session consisting of 5
tasks were described, and participants were reminded that they were
not being tested, but rather the system. In addition, users were
informed that their screen movements would be recorded using a
software program called Camtasia, for the Macintosh operating system
(http://www.techsmith.com/camtasiamac).
Participants completed the 5 tasks, and then were read a last script
introducing the last phase of the testing session: the debriefing and
reflection questionnaire. If the participant was comfortable, the group
members present during the session would ask the participant
questions about using the system, including whether they would use it
again, what features they liked and did not like about the DfR system
and why, and also suggestions for a “perfect” or ideal experience with
the DfR interface. (see Appendix D: Post-Test Questionnaire). In
addition to making notes on the form, notes were made on the logging
form as well (see Appendix B: Logging Form). Participants were then
thanked again for participating in the usability session and excused.
Analysis
After each interview, the interviewer and note taker would briefly
discuss the information gathered from participant, filling out the
logging form with more detailed information. This resulted in a small
list of extended inferences that were then presented to the entire team
at a later date. Two days after all testing sessions were completed, the
entire team met to conduct an interpretation session, where we found
common issues and sentiments among our participants. Video from
Camtasia was reviewed by all team members to find specific patterns
commonly used to search in the DfR interface. There were many
specific issues and sentiments shared by all 5 participants, who were
documented and grouped according to heuristic violated. These
groups were then used to generate practical recommendations for
resolving the issue.
Findings and Recommendations

General Findings
The majority of our participants had positive impressions and
experiences with using the DfR interface. Three out of five participants
found the system functionality to be rather intuitive, and demonstrated
that the purpose of the DfR is evident, contrary to information
gathered previously. All participants experienced frustrations primarily
related to the system feedback and navigation heuristics, which are
explained in detail below. However, DfR’s recent changes in the
interface probably increased the appeal of the system, as most (4 out
of 5) expressed that they would use the DfR tool again for research
that required some advanced analysis.
Key Findings
In the following part of key findings and recommendations, the findings
have been re-ordered from high priority to low priority, according to
the severity of the findings we have pointed out.
Key Finding 1:
Users did not understand that different sections of the site were
interrelated, connected through the search.
Different sections of the page appear to be disconnected, when they
are actually connected. The sections in question are the summary
graphs, results list, and key terms. In addition, the “references profile”
section is also, to some extent related. More specifically, these main
sections, which compose the bulk of the sites’ features, are all about
the search done on the main page. Yet the visual layout does not give
sufficient indication that these parts are intricately interrelated.
Evidence:
The users were unsure what to do with the search boxes on the
different pages, not understanding their relationship with the search on
the main page. Some users explained that they were unsure whether
their search had worked or not. When in the results list page, the users
were met with a list of articles, and an empty search box. This gave
them conflicting information, since the empty search box signals they
have an empty search, but the list of articles is related to some search.
In fact, the list of articles relates to the search they did on the main
page. One user in particular ran a search on the main page, then when
he went to the results list, retyped the search on the search box. The
result of opposite to the desired result: instead of getting the results he
wanted, the system got rid of his search. He found this very frustrating.
This shows it is not clear to the users, how the different features relate
to the main search on the summary (main) page.
Recommendations
We recommend having the links in question (results list, key terms,
references profile) be displayed under the search box, inside the grey
box, on the main page. This indicates to users that they are
subsections of that search. In addition, the search box - the entire grey
area - should be moved to some place higher or more prominent on
the page.
Key Finding 2:
Inadequate feedback is provided when users done certain activities,
which could make users easier to feel confused and frustrated with
uncertainty on whether they succeed in their performances.
Evidence:
When users perform certain retrieval behaviors, such as searching a
query or selecting a selector to refine the results, not so many
significant indications on changes are shown on the page to confirm
users’ success in their activities. As Figure 1 shown, when users
finished a query, there is not any text description or coloring emphasis
to indicate the differences, except for a single number changed in the
“Total Selected” box in the upper left corner on the page and a
“Selection Criteria” box added below the “Total Selected” box. These
less obvious changes cannot provide a concrete support for users to
examine whether they succeed in their efforts.
(a). Before any search (b). After searching “dragon”

Figure 1: The content of DfR main page. From these two pictures showed above,
there are not adequate indicators to show users’ efforts. The layout and contents of
DfR is much static when users done a search.
Since the data retrieval process DfR provides is not so much the same
as that people usually encounter which is based on the facets user
choose to narrow down the selection of database to meet their needs,
it is significantly important for users to be aware of which status they
are in. Although the “Selection Criteria” box is design to record and
show users’ current retrieval status, according to our usability testing,
our subjects usually ignore to notice it because of its inappropriate
position, in the upper corner and above a chart selector (Figure 2).
Such unnoticeable “Selection Criteria” box often makes users forget or
get confused with their current status, leading to some unexpected
consequences.
Figure 1: The “Selection Criteria” box after searching “dragon”. As this picture
shows, the box appears without any indications or emphasis, and the style is much
consistent with other page components, which makes users usually ignore it.
Recommendations
- a):Re-design the feedback system of DfR system, adding several text
description or coloring emphases instantly associated with users
behaviors.
The text description or coloring emphases needed to added or
modified include the title of the current webpage, keeping the latest
search term in the search box, more explicit description to indicate
what searching criteria the current result is based on, emphasizing the
changes when query performed. By doing this, users could found their
current status and examine their progress more easily, reducing their
confusion and frustration.
- b): Change the place of “Selection Criteria” box, making it more eye-
catching.
The place of “Selection Criteria” box may need to be moved from the
upper left corner to the top position of the right area, just below the big
search box, which could enable users notice it at their first sight. Also,
the box should be shown all the time even if there is no query
currently. According to the searching mechanism of DfR, even if users
don’t perform any query, the result should always be available. At this
point, the “Selection Criteria” should not be empty, instead of several
words should appear to indicate this situation.
Finding 3:
The cognitive model of users when they are performing a search is
exact the same as how they do in Google. However, the mechanism of
JSTOR-DFR for searching is different. User are confused by the how to
start a search in Jstor-dfr.
Evidence:
1.
Users don’t understand that when they add more selection criteria,
they aggregate rather than start a new search. From our observation
of the test, each time the user starts a new task they didn’t clear their
previous selection criteria. This causes a big problem that sometimes
they cannot get any results when they start a new task, because the
aggregation of previous criteria and current criteria narrow down the
results too much.
2.
In task 4, users are supposed to use three terms: “politics”,
“republicans”, and “ witch” to get the result. What we expect is that
they type these key terms individually as shown in figure2. However,
users show a tendency to enter the keyterms in a row as shown in
Figure 3. These two criteria set produce totally different results but
users have no idea about the slight difference between these two.
Figure 2. Search by individual keywords
Figure 3. Search by keywords in a row
3.
In task 4, we ask user to remove one of the key term used in the
previous query. Because the previous query contains 3 key terms, to
remove the key term “witch” , the user often just retypes the series of
key term string without “witch “. It turns out that the second query will
be added into the original set of selection criteria as shown as figure 3.
This behavior doesn’t make any sense when people perform a search
and try to remove a keyword from the previous query.
figure 3. The aggregated criteria
4.
When people search for something in google, they tend to use
“keywords” or “keyterms” to find things. Under this circumstance,
anything entered by users on the search bar can be “keywords” or
“keyterms” by default, which are organized by google data crawling
spider according the relevance. Yet when users doing the same thing
in Jstor-Dfr , they have no idea that the term “key term” here means
one of the query items “Extracted Keyterms” throughout the whole
system. This is totally different from the general meaning of “keyterm”
used in the google. Since there are so many items of query like
“Content Type”, “Discipline” and “Discipline Group” arranged in the
mechanism of search in Jstor DfR, when the user is typing something in
the search bar, they are supposed to assign one type out of these
items of query to what they entered in the search bar. How to assign
the type is to use the dropdown menu near the search bar,
nevertheless, users often overlook this dropdown menu and click on
submit button. In doing so, by default they assign “anywhere” to what
they put in the search bar, which means what they type will be
mapped to each item of query. For example, one user performing the
task1 got confused after he typed “politics” in search bar and click on
the “Political Science” under “Discipline Group ” in the left sidebar.
(Figure 4)
Figure 4. Different type of the items of query

Recommendations:
Recommendation 1:
Incorporate the feature of google search into the Search bar of Jstor-
DfR
For example , blank as AND, Phrase search(“”), exclude terms(-) and
OR operator.
The cognitive model of Jstor-Dfr is more like a desktop application for

professionals or researchers. However, for some casual researcher or
non-technical individual, they have a stereotype of search inside. As a
prominent online search engine, Google affects people ‘s habit of
searching a lot. People are used to applying features of Google into
every system they used to search something. For example, using a
blank to separate the keyword and an AND operator is quite commonly
used. Because the Meta date structure in Jstor-DfR is not friendly to
users. It is hard for a beginner to figure out how the search in Jstor –
DfR works. However, it doesn’t necessarily means Jstor-DfR need to
change the structure of their meta data. Even though it is not without
challenge to make the search function more intuitive, at least they can
work on the search bar and make it a little bit “googlized “. That way
some users may feel comfortable with the search bar and get the
result they actually want easier.
Recommendation 2:
Remove the dropdown menu near the search bar.
When people search things with the search bar, they just want to find
something which is most relevant to what they type on the search bar.
The most appropriate options in the dropdown menu can be “Any
where”. We would like to set this by default and get rid of this menu,
which confuse users and frequently overlooked by users.
Recommendation 3:
Relocate the “Selection Criteria” section from the left sidebar to the
place near search bar.
Jstor-DfR use the narrow-down approach to filter the result by the
criteria set up by users. People who are familiar with google search
would expect the system renew the query every time user typing
something. However, they have no idea that the system aggregates
the criteria. Relocating the “Selection Criteria ” and putting it near the
search bar may effetely inform user that they are using the narrow-
down approach (the criteria are aggregated). User commonly ignores
this “Selection Criteria” section in the left sidebar so that they don’t
understand they are constantly adding the criteria. Relocation of this
section would be helpful for user to get a better understanding about
what they are doing.
Key Finding 4:
The purpose and description of the Data for Research tool are fairly
intuitive, contrary to conclusions derived from previous research data.
Evidence:
In our persona interviews and survey exercise, we devoted much of our
efforts to design questions and explanations that easily and succinctly
described what DfR is and how it used to participants. Even after
multiple revisions, participants in those previous exercises seemed to
indicate that they did not understand the purpose of DfR as a
searching tool to gather information about the articles in JSTOR. At that
time, we recommended that the main page of the DfR site have a
small blurb or short sentence telling visitors what the DfR’s purpose is,
particularly in relation to JSTOR as a whole. [The DfR development
team, after further contact, indicated to us that such a
recommendation was a branding issue. With DfR still in beta, it was
determined that a blurb at this point would not be very helpful or
accurate, though they wanted more information to gain insight into
how new visitors to the DfR site understood the DfR tool before they
actually used it.
The pre-assessment portion of our usability testing sessions consisted
of a short questionnaire that subjects filled out while viewing the main
page of the DfR site (for the first time, to the best of our knowledge).
Questions for the pre-assessment were designed to assess how
intuitive using a tool such as the DfR would be from simply observing
its appearance and features. We found that all subjects in our usability
test fairly accurately described the general functions and purpose of
the DfR from only viewing (and not using) the main page of the DfR
website. One question asked subjects, “From the front page, what do
you expect this site is about?” Below are the open-ended answers our
subjects provided:
“The statistics about the publications, categorized by publication year, discipline.”
“I think it's a site that gives information about articles published on certain topics.”
“Searching for scholarly articles by date and discipline/area.”
“it seems like the front page can allow me to search for articles and the website
categorize all the articles by discipline groups in general.”
“This is a websites showing some statistics about paper publications and properties
in JSTOR.”
The above statements used the same accurate descriptors, theorizing
that DfR provides some searching function that gives certain
information about articles, though none explicitly said that DfR
provided “metadata” information. One interesting note is that 4 of 5
subjects included the “discipline” or “discipline groups” feature in their
description.
Subjects also had a fairly accurate understanding of DfR’s relationship
to JSTOR as a tool that provides information using the articles in
JSTOR’s corpus. Four of five subjects identified the DfR as “providing
information about the JSTOR database/articles”, and all recognized that
JSTOR functioned to encompass the DfR tool.
Recommendations
- a): It is not necessary to have a descriptive blurb or sentence about
what the DfR is on the main site, as it is fairly intuitive. Even with a
description, users will probably do some exploration of the site to get a
“feel” of how it functions.
- b): Narrow/regroup the selectors (“facets”) in the left navigation.
As stated above, most of our subjects included the “discipline” or
“discipline groups” facets as part of the description of the site. This is
somewhat inaccurate, as those are merely options for organizing and
narrowing result information. It should be more apparent that the left
navigation modifies the search in the main panel. DfR has already
made progress in this regard. There has been a significant reduction in
the number of selectors/facets available as new beta versions of DfR
have been released.
Key Finding 5:
The graphs provided by the DfR system in the “Summary” and
“Reference Profile” views are unclear in both in purpose and clarity.
Evidence:
Since one of main purposes of the DfR system is to enable users to
visualize the JSTOR data they create and refine, several sample charts
and diagrams are provided in the DfR main website frame to facilitate
users’ customization of metadata and also inspire users’ visual
creativity. Although most of our participants confirmed that the graphs
are attractive and helpful to them, reactions gathered during usability
testing revealed that the graphs are not particularly helpful in
clarification of result information or interpreting meaning.
One particularly significant issue is the “Reference Profile” page. A pre-
task assessment question was designed to specifically assess what
functions the subjects expect from “Reference Profile” page. Subjects
by and large did not understand the purpose of this page, creating a
cognitive misunderstanding. In detail, we asked subjects, “What do you
expect will be under ‘references profile’?” The answers our participants
gave are provided below:
“The references profile I think it might be the graphs that I can see in this page.”
“I expect "references profile" to be something to do with citations.”
“REFERENCES PROFILE - Information about the sources cited in the articles
found.”
“For references profile, I think we can see the author's information, date,
publishers, etc.”
“I expect under "references profile" there will be the profile showing where they get
these JSTOR data.”
From the above information, the title “Reference Profile” doesn’t have
a clear meaning to users. Furthermore, the content of the “Reference
Profile” page does not provide any clarification or support in
determining the purpose of the page by context. However, compared
with the content of this page after users’ view of the page before they
query, there aren’t many significant changes delivered to users based
on current “Reference Profile” page, especially for those two charts –
they are static (Figure 6).
Figure 6: The content of “Reference Profile” page. From these two pictures showed
above, it could be seen that no matter what query users perform, this page almost
remain statically. It seems that the data presented here is only related to all JSTOR
archive.
Initially, users open the DfR website at the “Summary” page, with
exactly three different charts: “Articles per year”, “Articles by
Discipline Groups (pie chart)” and “Articles by Discipline Groups (bar
chart)”, displayed prominently on the page (Figure 7). After users
perform a search query, one new chart is added on the “Summary”
page, “Relative articles per year” (frequency graph). There is no
sufficient description or annotation of these graphs and their meaning,
which makes it difficult to clarify that what their purposes are. For
example, our subjects found it is impossible to read the exact
proportion of certain parts of the pie chart, since it does not provide
any numbers. For the bar chart, our subjects could only get an
approximate value from the x axis. For the new added “Relative
articles per year” chart, there is no indication that this new chart will
be created based on users’ search. It is also unclear that why this
newly created chart is not put at the top position of this page, instead
of behind the “Articles per year” chart.
Related to the purpose of these four diagrams is their aesthetic
appearance. During our former investigation and other evaluation
periods, we hadn’t clarified reasons why these graphs are presented at
the main page, and whether they could meet users’ typical
requirements. Although these pictures may be considered samples
showing the extent of DfR’s services, clear and concrete selecting
criteria is needed to make sense of data manipulation and benefit their
further deployment of refined data.
Figure 7: The content of “Summary” page. Compared with the upper left picture, a
new chart will be created based on users’ new search. It appears that it is difficult to
distinguish the differences between “Articles per year” and “Relative Articles per
year”.
Recommendations
- a): Re-examine the “Reference Profile” page to clarify its designing
purpose.
If the “Reference Profile” page survives in its re-examination, the title
“Reference Profile” is better to be replaced by other name which can
interpret its purpose clearly. Also, the content of this page should be
re-determined and re-arranged in order to better fit its new design
purpose. Or if the “Reference Profile” page is originally designed to
keep static independent of users’ searching results, it should be
isolated explicitly and associated with a clear explanation or
description.
- b): Create the tooltips associated with these graphs on their side.
For those diagrams DfR provides, a better way to clarify their design
purposes and their usage is just to add some tooltips right beside
them, since there is much adequate space on the right side of each
chart on the page, which could make it easy to add, plus mo any
technical obstacles exist to prevent its implementation.
- c): An survey or statistical analysis based on current users could be
conducted to clarify which diagrams should be presented in the DfR
website.
Based on the behavior of current users, some clues should be drawn to
indicate what charts are being used and what they prefer. The graphs
shown in the “Summary” page should be the ones users actually use.
Also, the order of these charts should be determined by certain sorting
criteria, making the ones used more frequently at the upper positions.
The changes of these graphs should also be emphasized and clearly
delivered to users.
Discussion
Usability testing with novice users generally verified conclusions
reached from previous heuristic evaluations. In addition, the
background and post-task reactions from participants provided more
insight into our inconclusive findings from the surveying process. One
drawback in the makeup of our participants is that they are around the
same age (7 years difference between the oldest and youngest
participant), and had generally very intensive research backgrounds. A
couple of participants were self-identified casual researchers. There
was also a trend where the heavier researchers seemed to like DfR
more, and have less trouble using its system. In this sense, it is likely
that a system designed for a serious researcher is quite different than
a system designed for a light researcher, and the former is to be
preferred. The makeup of our participants supports our finding that
most intense academic researchers are also casual researchers;
researching and gathering information is differentiated by purpose and
not intensity.
Most of our findings and recommendations focused on the interface.

Some big questions that came up that we did not have time to address
were: What information visualization / graphical analysis would be
good to enhance on? What needs to the users have for their research
that this component could fill? We identified the information
visualization / graphical component of DfR to be of need and
significance based on the results of our competitive analysis.
When considering the frustrations and confusion encountered by our
participants, we found that they were rooted in how our participants
cognitively organized results information. Their cognitive framework is
based on the "Google model" in terms of search and retrieval. Google's
model for search assumes that there are no articles or terms that
match a query. When a user types in search terms, Google's system
finds the most popular (with the most in-links) articles that contain ALL
of main search terms. Results are organized in the order of frequency
and contextual relevance of the terms found in an article to the
sentiment (meaning) the user was searching for. The Data for
Research was designed and operates in a framework that conflicts with
the "Google method". DfR assumes that ALL articles in the JSTOR
database are relevant, without search terms. The query a user
generates is meant to exclude all irrelevant information only in terms
of word frequency found in a JSTOR article. Results are organized by
the frequency with which search terms appear in the article text. These
differences in searching and organization reveal a serious divergence
between the developer and user cognitive models.
Users seemed to infer the searching cognitive model from the

hierarchial menu structure positioned at the top of the page. This lends
itself to forward and depth-first search methods (Norman, 2008). These
models are directly transferred from the way users search for
information in the real world. It must be recognized that the method by
which users search for information on a website, in a search browser
(which affords different functions, such as "back" and "forward" and
"bookmarking" features) versus offline software (which affords
backwards search methods, in that users are familiar with the final
results more than the methods by which to get there [Norman, 2008]).
When users encountered problems in the DfR system, they continued
to use the model of "depth-first" searching, though the DfR was
designed to start with the results and modify those results to fit a
user's needs. We have made recommendations to attempt to bridge
the gap between the two divergent models, so that the DfR can be
optimized to best serve users, but further research should be
conducted to verify and detail the cognitive framework users of the
DfR system are operating under. This will provide more comprehensive
information to create a product that is both innovative and also suits
the needs of future DfR users.
Conclusion
Most of our findings relate to visibility of system status: the user lacks
sufficient clues as to what the system is doing at a given moment. The
user may be doing the right thing, but cannot know for sure from the
visual clues provided on the page. Users do not have clear visual
indication that the links on the top are about the search they did on
the main page. We recommend restructuring where these links are on
the page in order to make the substructure clear. Users do not have
enough indication what search query the results list is for. We
recommend adding this information next to the number that indicates
how many results there have been, in Google style. Many users failed
to notice a key feature of the site: as you add more search terms, it
further filters the results rather than start new searches. We
recommend making the ‘selection criteria box’ part of the main grey
search box, where they will have ready access to that information, and
where it will be prominent and easy to see. Some of the graphs provide
unclear information, with no explanation of their meaning available. A
simple tool-tip that appears when the user hovers the mouse over the
graph would solve this issue. Further research in usability analysis is to
examine what kinds of graphs would be most useful to our user-base,
in order to expand on the graphical display feature, since that is the
most distinctive feature DfR offers.
References
Burns, J., Brenner, A., Kiser, K., Krot, M., Llewellyn , C., Snyder, R..
(2009). JSTOR-Data for Research. Research and Advanced
Technology for Digital Libraries. Vol. 5714. Springer Berlin /
Heidelberg.
JSTOR. (2010) http://www.jstor.org
JSTOR Facts. (Oct. 2009)
http://news.jstor.org/jstornews/2009/11/october_2009_vol_13_issue_
3_js.html
JSTOR DFR. (2010) http://DFR.jstor.org
Norman, K. L. (2008). Cyberpsychology: An Introduction to Human-
Computer Interaction. New York, NY: Cambridge University Press.
Appendices Introduction
The following six appendices that provide more detailed information
about our sessions and evidence gathered from usability testing with
the DfR interface.
In Appendix A, we have included all scripting that structured the
usability testing sessions. Task descriptions have been included in the
scripts for the lead researcher’s knowledge, and are duplicated on the
logging form.
In Appendix B, we have included a blank logging form, created for the
tester to record during the usability testing.
In Appendices C and D, we have included the pre-task questionnaire
form and post-task questionnaire form, respectively.
In Appendix E are the completed logging forms (with some redaction
to protect our participants’ personal information) used for each
participant during our testing sessions.
Appendix A: Introduction Script
Scripting for Usability Test

[General introduction to the project - structure of session]
[Introduce yourself!]
Hi, thanks for coming (or other similar greeting)!
I am _______ (your name), and I will be leading this session. This is
_______ (partner's name), he/she will be taking notes and working with
the technical aspects of the session.
[Vague background about our project - DON'T SAY WHAT IT DOES YET]
We are working with JSTOR, a non-profit archive of journal and article
works. You may have heard of them or used their interface to conduct
research. JSTOR has commissioned a team of developers to create a
tool. This tool is called Data for Research (DfR for short). As part of our
evaluation, we will be assessing how users (that's you!) interact with
JSTOR Data for Research website interface. The session will be
conducted in four parts.
First, we will ask you a few demographic questions. None of the
information you provide will be publicized, and you will remain
anonymous in our results. We will not collect your name or other
private personal identifiers.
Next, we will ask you a few questions about the system, to see what
you might expect from it. Remember, we are testing the SYSTEM,
not your abilities - there are no "right" or "wrong" answers, nor
is there any competition with other users involved.
Then, we will ask you to complete [how many?] tasks related to the
DfR system - I will explain more about the process when we come to
it.
Lastly, we will ask you some reflective questions about using the
system and the interface.
Your participation in this session is completely voluntary. If you feel

uncomfortable answering a question or with an activity during the
session, or you simply wish to discontinue the session, please let us
know, and we will end the session.
The entire process should take about 30 - 45 minutes.
Are you ready to begin? (Yes/No; need "Yes" to proceed)
Intro to Pre-Test [talk about terms that they may see on the
site; no right or wrong answers]
Again, this part of the session is to assess what you expect from the
system. We will show you a few pictures of frames from the DfR
website, and ask you some questions. Remember, there are no "right"
or "wrong" answers to these questions, so please speak freely.
Do you understand, and are you ready to proceed? (Yes/yes,

but provide clarification/No)
- PRE-TASK: [show them the site; ask what they think]
[Intro to Testing Session]

Now, we are ready to start the usability session.
(More detailed background about JSTOR and DfR) We are
working with JSTOR, a non-profit archive of journal and article works.
You may have heard of them or used their interface to conduct
research. JSTOR has commissioned a team of developers to create a
tool for analyzing data about the articles in JSTOR (this is called
"metadata"). This tool is called Data for Research (DfR for short).
(The task session) There are five (5) scenarios written on paper that
we will read through. Please read along with us. Each scenario will give
some background and then ask you to perform a task using the DfR
interface. We have the computer set up here in front of you.
Use the computer (the browser has been set to the DfR website
already) to complete the task. As you are working, we ask that you
verbally speak and tell us what actions you are performing, why you
are performing them, and other comments about how you will
complete the task. This is part of the "think-aloud" process. _____
(partner's name) will take a few notes and observe you as you are
working. When you believe that you have completed the task, please
let me know by saying, "I'm done" or "I think I'm finished".
(What you can and cannot do to help) I cannot provide any help or
hints about completing the task unless you tell me that you need help
or that you are stuck. I will be able to provide clarification about the
task and scenario when needed. I may also encourage you to speak
aloud if you are not talking that much.
(Other informed consent info)
Again, there is a program that will record your mouse movements and
interaction as you are on the site, but your image or voice will NOT be
recorded.
Remember that we are testing the system, and not your abilities.
As you are working with the system, we will be using a program that
will monitor your mouse movements around the screen and within the
system interface. Your face and voice will not be recorded by this
program.
If you feel uncomfortable at any point, let us know and we can stop the
session.
Do you understand, and are you ready to proceed? (Yes/Yes,

with clarification/No)
- TASKS
[Post-testing and reflections]
Now that we have completed the scenarios portion of the session, we

want to ask you to reflect on using the DfR site. We will show you some
pictures, and ask you what you expect from those parts of the site or
what you expect will happen when you use this function/click this
button. These may be similar from the pictures in the pre-task, but
please don't feel that you have to give the same answers. We are not
doing performance testing - simply speak what first comes to your
mind.
After this portion, we will ask you a few reflective questions about
using the site, and ask for recommendations. We will not disclose your
information to any third-party units, and all of your comments will be
anonymized if they are used in any published reports. The notes that
we have made about your session will be destroyed after our
assessment is complete (within the month).
Are you ready to proceed? (Yes/No)

- POST-TEST REFLECTIVE QUESTIONS
Appendix B: Logging Form
Logging Form
Participant number:
Participant name:
[Task #1: Results Breakdown]
DFR gives you graphical information about search results. In this task, please search
for the term "politics", then report which discipline area contains the largest number
of results. If possible, check how many articles are in the social science
How to do it:
Part 1: just type the search on search bar, scroll down to graph, and observe the
different sections of the pie chart.
Part 2: two ways to do it. One is to open "discipline gruop" and see the number in
parenthesis under "social sciences". The other way is to download chart data under
pie chart.
[ ]Success: finding the graph and pointing/indicating it's the “social sciences” part
of the pie chart. The total number is: 703,557. For social sciences: 446,696.
Time Screen Type Note
Date:
Time:
BACKGROUND INFO
1. Age:
2. Gender: male/female
3. Occupation:
4. Highest education completed/current education background?
5. Please choose the statement that most fits regarding your research:
(A) I consider myself a casual researcher
(B) I consider myself a moderate researcher
(C) I consider myself a serious research
PRE-TASK:
a.From the front page, is it clear to you what this site is about?
IF YES: what is it about? What do you think DFR's relationship with JSTOR is?
b. What do you expect to see under "results list?"
What do you expect to see under "references profile"?
c. What do you expect will happen when you click on any of these links on the side?
User Testing Task logging form:

[Task #2: Find Related Key Terms]
Please find the three most common key terms associated with the term “witches”.
Tell us your findings: which three key terms are most frequently associated with the
query "witches"?
How to do it:
Run a search on "witches", click on "key terms", and observe the 3 biggest one.
[ ]Success:
[Task woman,
#3: Filtered witch, witchcraft
Searching]
Suppose you are researching the origins of the universe. So you go to JSTOR's Data
for Research site and search for the term: "origins of the universe". But to your
surprise, as you look over the graphs, you realize that about half of the results are in
the humanities.
In this task, you need to adjust the search criteria so as to have only results in the
sciences show in the graphs. You want the graphs to show what you might expect:
most results in the sciences. Note that 'mathematics' counts here as a science. Give
us the total number of your final result.
[ ]Success: Set “Science” and not “Humanities ” in the Certieria of the disciplinary
Group in the list. Report the number of the final result.

[Task #4: Adjust searching]
Please run a search of three terms: "politics", "republicans" and "witch" and note the
total number of results. Then we decide 'witch' has nothing to do with the political
query. So remove JUST the term "witch". What is the resulting number of articles?
[ ]Success: For all three terms: 735,304 article found. After 'witch' is
removed: 714,244 articles total. Note: the point of this task is to see if they can
remove search criteria (which is hard on JSTOR).

[Task #5: Advanced Search]
This task has two components. First, find the most cited article related to the
keyword “dragons”, and tell us the title of this article. Then, find the most frequent
key word associated with this article (the one you just found). If possible, give us the
exact frequency of those words.
How to do it:
Part one: runs a search on "dragons" then goes to "result list". Under "sort by"
chooses "times cited (descending)"
Part two: clicks on "Key Terms" link right under the article, and look at the two
key terms on the pages that appear.
[ ] Success: the article is “individualism and psychology/tyler burge/the
philosophical review vol. 95 no. 1 (jan. 1986)”. And the most two frequent words
are “psychology (1.0)” and “individualistic (0.865)”
Type: CU: comment by user E: Error !: Critical Incident

Post-task:
Debriefing/ Reflective session
- ask about impressions of the system
- ask about best/interesting features
- ask about frustrations encountered while completing tasks & how you would prefer
it to be?
Appendix C: Pre-Task Questionnaire
Appendix B: Post-Test Questionnaire
Appendix B: Redacted and Filled Out
Logging Forms
1. Participant 1
2. Participant 2
3. Participant 3
4. Participant 4
5. Participant 5
Logging Form
Participant number: 5
Participant name:
Date: 4/19
Time:11:00 pm
BACKGROUND INFO
1. Age:24
2. Gender: male/female: female
3. Occupation:Student
4. Highest education completed/current education background? Master
5. Please choose the statement that most fits regarding your research: (C )
(A) I consider myself a casual researcher
(B) I consider myself a moderate researcher
(C) I consider myself a serious research
PRE-TASK:
a.From the front page, is it clear to you what this site is about?
IF YES: what is it about?What do you think DFR's relationship with JSTOR is?
b. What do you expect to see under "results list?"
What do you expect to see under "references profile"?
c. What do you expect will happen when you click on any of these links on the side?
User Testing Task logging form:

[Task #1: Results Breakdown]
DFR gives you graphical information about search results. In this task, please
search for the term "politics", then report which discipline area contains the
largest number of results. If possible, check how many articles are in the social
science
How to do it:
Part 1: just type the search on search bar, scroll down to graph, and observe
the different sections of the pie chart.
Part 2: two ways to do it. One is to open "discipline gruop" and see the
number in parenthesis under "social sciences". The other way is to download
[Task #2: under
chart data Find Related
pie chart.Key Terms]
Please find the
[X]Success: three the
finding most common
graph and key terms associatedit's
pointing/indicating with
thethe term “witches”.
“social sciences”
Tell us your findings: which three key terms are most frequently associated
part of the pie chart. The total number is: 703,557. For social sciences: 446,696. with
the query "witches"?
Time
How to do it: Screen Type Note
Run a search on "witches", click on "key terms", and observe the 3 biggest
11:01 Main ! Don’t know what happened after she
one.
click on the search button on the search
[ X]Success: woman, witch, witchcraft
bar.
Time Scree
Main Typ
CU Note
“no response”
n e
11:10 Main E Didn’t clear the previous searching

criteria.
11:05 Main E look back left sidebar and click on the
search criteria “political science” under
the discipline group.
11:11 Main ! Directly type “witches” and get no
result.
11:06 Main ! Seems confused by the keyterm and
discipline, cannot distinguish the type of
critetia she used in the query.
11:12 Main CU “what’s extracted keyterm?”
11:24 Keyter ! She has go to the right page but doesn’t

m know the meaning of the size of the
keyterms.
And try to click on the “download the
keywords” to get the answer.
11:25 Keyter CU “Why the list of the keyterm is not

m ordered”
11:25 Keyter ! Try to tell the answer by looking up the

m exact number of the keyterm in the list.
[Task #3: Filtered Searching]
Suppose you are researching the origins of the universe. So you go to JSTOR's
Data for Research site and search for the term: "origins of the universe". But to
your surprise, as you look over the graphs, you realize that about half of the
results are in the humanities.
In this task, you need to adjust the search criteria so as to have only results in
the sciences show in the graphs. You want the graphs to show what you might
expect: most results in the sciences. Note that 'mathematics' counts here as a
science. Give us the total number of your final result.
[ Y ]Success: Set “Science” and not “Humanities ” in the Certieria of the
disciplinary Group in the list. Report the number of the final result.
Time Scree Typ Note

n e
11:26 Main E forget to erase the previous searching

criteria
11:27 Main ! Type “the original of the universe” in the

searching bar
11:28 Main ! go to the left sidebar and click on

discipline “science” and then get the
answer.
[Task #4: Adjust searching]
Please run a search of three terms: "politics", "republicans" and "witch" and note
the total number of results. Then we decide 'witch' has nothing to do with the
political query. So remove JUST the term "witch". What is the resulting number
of articles?
[X]Success: For all three terms: 735,304 article found. After 'witch' is
removed: 714,244 articles total. Note: the point of this task is to see if they
can remove search criteria (which is hard on JSTOR).
Time Scree Type Note

n
11:28 Main ! Clear all previous criteria
11:29 Main ! Did the multiple search as hoe w people

do in google search: type the keyword
separated by a blank in a row.
1:30 Main1 ! Type aging but no witch:”politics

republicans”
1:31 Result ! Clear all agin

list
[Task #5: Advanced Search]
This task has two components. First, find the most cited article related to the
keyword “dragons”, and tell us the title of this article. Then, find the most
frequent key word associated with this article (the one you just found). If
possible, give us the exact frequency of those words.
How to do it:
Part one: runs a search on "dragons" then goes to "result list". Under "sort by"
chooses "times cited (descending)"
Part two: clicks on "Key Terms" link right under the article, and look at the two
key terms on the pages that appear.
[Y] Success: the article is “individualism and psychology/tyler burge/the
philosophical review vol. 95 no. 1 (jan. 1986)”. And the most two frequent words
are “psychology (1.0)” and “individualistic (0.865)”
Time Screen Typ Note

e
11:32 Result ! go to the result list and type “dragons”

list in the search bar
11:33 Result ! She seems like to see the search bar

list and the item on the navigation
independent.
11:34 Summa ! To find the most cited article, she went

ry back to the summary page and try to
figure it out by browsing the graphics.
11:35 Summa ! Seems like she struggle with “cited

ry artivcle”
11:37 Result ! Eventually find the “”sort by” drop

list down menu and the get the right
answer.
11:37 Result ! There is no number of citation shown on

list the list so that she cannot confirm what
she had done is right.
Very confused by the content of the
search result because she cannot find
any words like dragon in the article she
g1ot.
11:38 Result CU “There should be “dragon” there

list
Type: CU: comment by user E: Error !: Critical Incident
Post-task:
Debriefing/ Reflective session
- ask about impressions of the system
- ask about best/interesting features
- ask about frustrations encountered while completing tasks &

thank them for coming

Evaluation of JSTOR DFR: Usability Testing

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Evaluation of JSTOR DFR: Usability Testing

Uploaded by

Copyright:

Available Formats

Evaluation of JSTOR DFR1:

Word count: 4,000

Findings and Recommendations

(a). Before any search (b). After searching “dragon”

Figure 2. Search by individual keywords

Figure 3. Search by keywords in a row

figure 3. The aggregated criteria

Figure 4. Different type of the items of query

The cognitive model of Jstor-Dfr is more like a desktop application for

Most of our findings and recommendations focused on the interface.

Users seemed to infer the searching cognitive model from the

Scripting for Usability Test

Your participation in this session is completely voluntary. If you feel

Are you ready to begin? (Yes/No; need "Yes" to proceed)

Do you understand, and are you ready to proceed? (Yes/yes,

- PRE-TASK: [show them the site; ask what they think]

[Intro to Testing Session]

Do you understand, and are you ready to proceed? (Yes/Yes,

[Post-testing and reflections]

Now that we have completed the scenarios portion of the session, we

Are you ready to proceed? (Yes/No)

Time Screen Type Note

b. What do you expect to see under "results list?"

What do you expect to see under "references profile"?

User Testing Task logging form:

Time Screen Type Note

Time Screen Type Note

Time Screen Type Note

Type: CU: comment by user E: Error !: Critical Incident

- ask about impressions of the system

- ask about best/interesting features

b. What do you expect to see under "results list?"

What do you expect to see under "references profile"?

User Testing Task logging form:

11:10 Main E Didn’t clear the previous searching

11:24 Keyter ! She has go to the right page but doesn’t

11:25 Keyter CU “Why the list of the keyterm is not

11:25 Keyter ! Try to tell the answer by looking up the

Time Scree Typ Note

11:26 Main E forget to erase the previous searching

11:27 Main ! Type “the original of the universe” in the

11:28 Main ! go to the left sidebar and click on

Time Scree Type Note

11:28 Main ! Clear all previous criteria

11:29 Main ! Did the multiple search as hoe w people

1:30 Main1 ! Type aging but no witch:”politics

1:31 Result ! Clear all agin

Time Screen Typ Note

11:32 Result ! go to the result list and type “dragons”

11:33 Result ! She seems like to see the search bar

11:34 Summa ! To find the most cited article, she went

11:35 Summa ! Seems like she struggle with “cited

11:37 Result ! Eventually find the “”sort by” drop

11:37 Result ! There is no number of citation shown on

11:38 Result CU “There should be “dragon” there

- ask about impressions of the system

- ask about best/interesting features

- ask about frustrations encountered while completing tasks &

You might also like