You are on page 1of 4

YVONNE LEOW

- INNOVATION PROPOSAL Local newspapers have been documenting our lives for centuries, but in an increasingly datadriven world, they are failing to realize their potential. While other industries are embracing data science
to make smarter business decisions, newspapers are not. Instead, decades of institutional knowledge are
squandered in outdated archival systems, or buried in the minds of journalists who eventually retire or
leave the industry. This matters because in a marketplace where virality and pageviews are valued over
substance, newspapers are losing their identity. They report on towns that are rarely covered by national
media; meanwhile their newsrooms continue to shrink. It is time for that to change.
French political thinker Alexis de Tocqueville once said, When the past no longer illuminates
the future, the spirit walks in darkness. I would like to develop a tool that analyzes and visualizes
information from past stories to better inform a newspapers editorial and operational strategies. It will
save a tremendous amount of time and resources, and more importantly, enable journalists to better
serve their communities.
The Problem with Search
The majority of newspaper companies, such as Gannett and the Tribune Company, have digital
archives, but they are all limited to a basic search engine. When reporters type in keywords, like
Maricopa County government, they have to dig through hundreds if not thousands of results to find
what they are looking for. Rather than poring over pages and pages of headlines, journalists could
generate data visualizations to quickly inform their decision-making. This would improve local journalism
on two fronts:
It could help reporters:
-

Connect the dots. Historical trends and patterns are easier to spot when they are displayed in a
chart, map or graph. Imagine if a tool could illustrate how neighborhood property development
has expanded or stalled in the past decade. Or if it could scrape names and titles from articles to
illustrate the relationships among public officials, religious leaders and small business owners.
(See Figure A below.)

Discover story ideas. In addition to rookie reporters, local newsrooms are filled with people
who have covered a beat for ages. This tool could help them rethink their approach. They could
see how frequently they were quoting the same sources or possibly neglecting a community in
their coverage.

Fact-check sources. Reporters often rely on archive searches to fact-check political figures.
Instead of looking for similar keywords and phrases, this tool could aggregate past quotes to
verify whether the mayors political rhetoric changed during his or her administration.

(Figure A)
With limited staff and shrinking budgets, editors have to be smarter about how they direct their coverage.
Newsroom managers could also use this tool to:
-

Identify coverage gaps. Editors could visualize how many stories have been written about
school districts in upper class versus working class neighborhoods in the past five years. They
could use that kind of information to guide their beat reporters.

Enforce accountability. In addition to quantifying facts and figures, editors could use sentiment
analysis to measure the breadth of their coverage. It could break down, for instance, how many
negative stories have been published about predominantly Hispanic and Asian immigrant
communities compared to white middle-class communities. (See Figure B below.) Hidden biases
may exist, but newsroom leaders could use this tool to diversify reporting.

Create a source of revenue. Most small nonprofits, businesses and government agencies share
the same challenges as newspapers. They do not have the resources to integrate data analytics
into their operations. It is very likely they would be interested in the data or a way to visualize
years of information about their cities. If newspapers continue to add reporting into the

database, they could potentially sell this tool or the technology powering it to local
organizations.
The problem newsrooms face is that their data is meaningless without context. By analyzing and
visualizing decades of historical coverage, this tool would be the solution.

(Figure B)
The Era of Big Data
There are all kinds of organizations that use natural language processing (NLP), machine
learning and data visualization techniques to address industry challenges. Palantir Technologies is a
notable data analytics company that tackles issues like climate change and cyber attacks. A professor at
Columbia University recently created MedLEE (Medical Language Extraction and Encoding System) that
extracts medical information from past patient reports. A Stanford Ph.D. graduate founded a startup
called Ayasdi, which uses topological data analysis and machine learning to visualize massive data sets
without writing algorithms or queries. On the visualization side, several private software companies and
open-source platforms help organizations create charts and infographics. Tableau, Visual.ly, CartoDB,
Google Fusion Tables and Timeline.js are just a few examples. Even data journalism is not new. Knightfunded projects like DocumentCloud and Overview process thousands of PDFs and government
document dumps. Data analysis, however, has primarily been a way to tell a story. While the underlying

technology exists, it has not been commonly applied to help reporters cover their communities,
streamline internal operations, and earn potential revenue.
A Year at Stanford
If awarded a Knight fellowship, I would create a functional prototype of my proposal by the end
of the year. Here is my plan of action:
-

Months 0-3: I want to be fluent in the latest NLP, machine learning and sentiment analysis
techniques. I would immerse myself in Stanfords computer science department, particularly with
the nationally recognized NLP group led by Christopher Manning. Given my role at Digital First
Media and my relationship with managing editors in the Bay Area News Group, I am confident I
can work with local newspapers to access their archive data. I would gather a three-year sample
of archive news stories, and then create a graph database where I would assign relationships
between names, locations, organizations and other relevant terms.

Months 3-6: Design would be the next step. I hope to enroll in Stanford d.school courses to
learn how to best visualize information and user experience design. I will also take advantage of
the universitys vicinity to Silicon Valley, particularly Palo Alto-based companies like Palantir and
Ayasdi, to interview data scientists and product managers. The objective is to begin visualizing
the article data and wireframing a user interface for the prototype. I will gather feedback from
Stanford professors and local journalists to refine the design.

Months 6-10: During these last few months, I hope to enroll in entrepreneurship courses and
develop a working web application that visualizes the sample data. I hope to convince editors
that it is not only feasible, but also imperative to integrate data science into local journalism.
After demonstrating the project to Knight colleagues, Stanford professors and industry peers, I
would like to present the idea to newspaper groups and other potential funders.

The Power of Legacy


At the end of the day, it is not all about the data. This proposal is about changing local
newsroom culture to be smarter and more effective about how journalists cover their communities. In an
era when newspapers are contracting and rapidly falling behind, this is an effort to help them evolve. To
innovate is to make changes in something established by introducing new methods, ideas or products.
With the support of the Knight Fellowship, I want to help newspapers realize that their legacy does not
have to be a burden; it can be a competitive advantage. The key to their future is simply buried in their
past.

You might also like