You are on page 1of 6

5 Text Analytics Fundamentals You Should Know

Text analytics is becoming more mainstream and interest in it is growing. Here


are five basic things about text analytics you need to know now.

By Fern Halper

June 23, 2015

Data extracted from text can be extremely helpful in answering questions


involving why and what. For example: Why are my customers unhappy? What
is causing a specific problem in my operations? What are predictors of certain
risk? Why is my brand reputation declining?
This text data comes from internal sources such as call center notes, e-mail
messages, customer records, and claims. It comes from external sources such
as social media. TDWI research (Best Practices Report: Next-Generation
Analytics and Platforms) indicates that text analytics is becoming more
mainstream and interest in it is growing. In a recent TDWI best practices report,
for instance, 22 percent of respondents were already using text analytics and
36 percent were planning to use it in the next three years.
If you're considering text analytics for your organization, here are five
fundamentals you should know.
1. Text analytics is different from search. Text analytics is the process of
analyzing unstructured text, extracting relevant information, and transforming
it into structured information that can be leveraged in various ways. The
analysis and extraction process takes advantage of techniques that originate in
computational linguistics/natural language processing (NLP), statistics, and
machine learning. Text analytics is about extracting text; search is about
retrieving a document, typically when end users already know what they are
looking for. Text analytics can be used to augment search (as is often done in
commercial search engines).

2. Text analytics can be used to extract various kinds of information.


The typical kinds of information extracted from text include:

Terms: Another term for keywords.

Entities (often called named entities): Examples include names of


persons, companies, products, geographical locations, dates, and times.
Entities are generally about who, what, and where.

Concepts: Sets of words and phrases that indicate a particular idea or


meaning with which the user is interested. A concept might be "cost of
living increase" or "healthcare benefits." A particular piece of content
generally is only "about" a few concepts.

Sentiment: Sentiment reflects the tonality or point of view of the text.


The concept "unhappy customer" would lead to a negative sentiment.

Different vendors often use different terms to describe this kind of information.
Some vendors talk about facts, themes, topics, and events. It is important to
understand what each vendor offers. For instance, perhaps entity extraction
alone will not be useful to your organization or maybe the text analytics vendor
does not offer sentiment capabilities out of the box.
3. You may need to consider a taxonomy. In common usage, a taxonomy is
a method for organizing information into hierarchical relationships. This is
important in text analytics, especially when you're dealing with specific
vocabularies in certain industries. For instance, you may create a taxonomy
about products and services or about certain kinds of diseases.

The taxonomy can also use synonyms and alternate expressions. For instance,
"yearly increase" might all be referring to "raises." Some vendors will provide
baseline taxonomies out of the box, but don't expect that they will work out of
the box. Some vendors will tell you that you don't need a taxonomy -- that they
work off of already created sematic networks that represent the world or that
they have developed techniques that can get around this. For certain subjects,
you may get away without building a taxonomy, but be prepared to iterate on
what comes out of the tool in order to create your own categories.
4. You can analyze the data separately or marry it with structured
data. Organizations that use text data will often integrate it with traditional
data sources to analyze it. They view it as simply another form of data.
Analyzing text data without merging it with other data in your systems can also
be quite informative. For instance, analyzing social media data is often done
this way. Some organizations are even creating predictive models with text
data that are just as good as or better than those that use both text and
traditional structured data. It really depends on the kind of data you want to
analyze and what business problems you're trying to solve.
5. A different mindset is required for analyzing text data. Text analytics
does not have the same level of accuracy as some statistical techniques. It is
best to think of it as being directionally correct, so it is important to go into the
analysis with that perspective. The level of actual analytical skills is going to
depend on the problem you're trying to solve. Generally, understanding natural
language processing is not a prerequisite for text analytics, although some
training on the text analytics tool will be necessary.
Learn More
Interested in text analytics? Want to try it out for yourself? Consider attending
some of the hands-on workshops at the TDWI Analytics Experience July 26-31,
2015 in Boston or read the TDWI Checklist Reports Eight Steps for Using
Analytics to Gain Value from Text and Unstructured Content and How to Gain
Insight from Text.

About the Author


Fern Halper, Ph.D., is well known in the analytics community, having
published hundreds of articles, research reports, speeches, Webinars, and more
on data mining and information technology over the past 20 years. Halper is
also co-author of several Dummies books on cloud computing, hybrid cloud,
and big data. She is the director of TDWI Research for advanced analytics,
focusing on predictive analytics, social media analysis, text analytics, cloud
computing, and big data analytics approaches. She has been a partner at
industry analyst firm Hurwitz & Associates and a lead analyst for Bell Labs. Her
Ph.D. is from Texas A&M University. You can reach her at fhalper@tdwi.org, on
Twitter @fhalper, and on LinkedIn at linkedin.com/in/fbhalper.

Related Articles

As Analytic Search Heats Up, Information Builders Touts WebFOCUS


Magnify

Q&A on Emerging Tech: Analyzing the Voice of Your Customer

Question and Answer: Closing the Customer Feedback Loop

Related White Papers

New Opportunities for Business Intelligence: Eight Ways to Generate


Revenue and Drive Growth Now

IBM. A TDWI Partner


IBMs enterprise class big data platform allows users to address the full
spectrum of big data business challenges and gain a competitive edge by
enabling access and analysis of all data. Designed to complement your existing
information infrastructure, IBMs big data platform lets you get started quickly
today and easily expand to address more complex problems tomorrow.
www.ibm.com

You might also like