You are on page 1of 27

IBM

Watson Application
Developer Workshop





Lab02
Watson Knowledge Studio:
Building a Machine-learning
Annotator with Watson
Knowledge Studio








January 2017
Duration: 60 minutes
Prepared by Víctor L. Fandiño | IBM Global Business Partners

Overview

You can use Watson Knowledge Studio (WKS) to create a machine-learning model
that understands the linguistic nuances, meaning, and relationships specific to your
industry or to create a rule-based model that finds entities in documents based on
rules that you define.
To become a subject matter expert in a given industry or domain, Watson must be
trained. You can facilitate the task of training Watson with Watson Knowledge
Studio. With Watson Knowledge Studio you can deliver meaningful insights to users
by deploying a trained model in other Watson cloud-based offerings and cognitive
solutions, including AlchemyLanguage, Watson Discovery service and Watson
Explorer.
Watson Knowledge Studio provides easy-to-use tools for annotating unstructured
domain literature, and uses those annotations to create a custom machine-learning
model that understands the language of the domain. The accuracy of the model
improves through iterative testing, ultimately resulting in an algorithm that can
learn from the patterns that it sees and recognize those patterns in large collections
of new documents.
The following diagram illustrates how it works

• Based on a set of domain-specific source documents, the team creates a


type system that defines entity types and relation types for the
information of interest to the application that will use the model.

• A group of two or more human annotators annotate a small set of source


documents to label words that represent entity types, words that
represent relation types between entity mentions, and to identify
coreferences of entity types. Any inconsistencies in annotation are
resolved, and one set of optimally annotated documents is built, which
forms the ground truth.

Watson Application Developer Workshop 1

• The ground truth is used to train a model.

• The trained model is used to find entities, relations, and coreferences in


new, never-seen-before documents.

Additionally, you can build a rule-based model with Watson Knowledge Studio.
Watson Knowledge Studio provides a rules editor that simplifies the process of
finding and capturing common patterns in your documents as rules. You can then
create a model that recognizes the rule patterns, and deploy it for use in other
services.

Objectives

The objectives of this lab is to provide an overview on how to build a machine-


learning annotator in WKS, covering the following tasks:

• Create projects

• Create type systems

• Create document sets

• Add dictionaries

• Create tasks for human annotators

• Create dictionary-based annotators and machine-learning annotators

• Deploy the machine-learning annotator to AlchemyLanguage a compare


the results

Prerequisites

In the Labs Preparation Guide: Getting Started with IBM Watson APIs & SDKs you
have instructions to get an IBM Bluemix and Watson Knowledge Studio account.
Also, you will need Postman for testing the deployed annotator. For this lab, use
the latest version of Chrome or Firefox web browsers. For the best performance,
use a screen resolution of at least 1024x1280.
Note: in this lab you will be working in your own Watson Knowledge Studio instance
with the administrator role (ADMIN). That means that you are the only member of
the annotator component team. A real project always requires multiple human

2 Lab02 – Building a Machine-learning Annotator with Watson Knowledge Studio



annotators in addition to an administrator or project manager. In this case you will


be performing all the tasks. Because you are the only human annotator, you will not
be able to analyze inter-annotator agreement and adjudicate conflicts in annotated
documents, which is always an important part of the annotator development
workflow. For information about team members in Watson Knowledge Studio and
how to create users, see the section Assembling the team in the WKS Knowledge
Center.

Creating a project

A project defines all of the resources that are required to create a machine-learning
annotator, including training documents, the type system, dictionaries, and
annotations that are added by human annotators. For more information about
project creation, see Creating a project.

1. Log in to Watson Knowledge Studio with your administrator ID

2. In the WKS main page, click Create Project.

3. Give the project a name. You cannot change the project name later, so
choose a short name that reflects your domain content or the purpose of
the annotator component. You can specify a longer description, which
can be changed later. In this lab, we will name the project “wadwWKS”

4. Identify the language of the documents in your project. The documents


that you add to the project, and the dictionaries that you create or import,
must be in the language that you specify. In this example, documents will
be in English. The selected language cannot be changed latter

5. In the Component Configuration, leave the Default Tokenizer. The


default tokenizer is more advanced than the dictionary-based tokenizer;
it uses machine-learning to identify the tokens in the source documents
based on the statistical learning it has done in the language of the source
documents. It identifies tokens with more precision because it
understands the more natural and nuanced patterns of language. The
dictionary-based tokenizer identifies tokens based on language rules. The
selected tokenizer cannot be changed latter. See Tokenizers for more
details.

6. In the Project Manager Selection, you have the option to add project
managers to the project (the administrator can add or remove project
managers later by editing the project). Only the names of people that you

Watson Application Developer Workshop 3

assigned to the project manager role from the User Account


Management page for the instance are displayed. Since you have access
only to a single user ID, no names will be available, so this entry will be
empty

7. When you are ready, click Create. The project will be created and you will
be directed to the project Type System configuration. To change the
project description or add or remove project managers later, an
administrator can edit the project.

8. Sample Screen shot

4 Lab02 – Building a Machine-learning Annotator with Watson Knowledge Studio



Creating a type system

You will now learn how to import and modify a type system within Watson
Knowledge Studio. You must create or import a type system before you begin any
annotation tasks. See Type systems for more information about this topic.

9. Download the en-klue2.zip file to your computer. This file contains an


example KLUE type system for English documents

10. Within your project, click Type System in the banner or the navigation
menu. On the Type System page, click Import

11. Select the en-klue2.zip file from your computer and click Import. The
imported type system is displayed in the table

Watson Application Developer Workshop 5

12. 52 entity types and 2,177 relation types should be imported. You can
browse the type system. You can also edit an entity type. For instance,
locate the MONEY entity type. In the Action section click Edit and in the
Roles column delete the role AWARD. Click Save

Adding documents for annotations

After you finish making changes to the type system, you can begin adding
documents to your project. You will now learn how to add documents to a project
in Watson Knowledge Studio that can be annotated by human annotators. See
Adding documents to a project for more information about adding documents.

13. Download the documents-new.csv file to your computer. This file


contains example documents suitable for importing.

14. Within your project, click Documents in the banner or the navigation
menu. On the Documents page, click Import Document Set

6 Lab02 – Building a Machine-learning Annotator with Watson Knowledge Studio



15. Select the documents-new.csv file from your computer and click Import.
The imported file is displayed in the table. The imported document set
should contain 14 documents. You can click the document set in the table
to access a browse the content of each document in the set. They contain
news about computing technologies and companies.

At this point, as a Project Manager, you are now ready to divide the corpus into
multiple document sets and assign the document sets to different human
annotators. Since you are the only user in the instance, you will create a single
annotation set.

Creating annotation sets

An annotation set is a subset of documents from an imported document set that


you assign to a human annotator. The human annotator annotates the documents

in the annotation set. To later use inter-annotator scores to compare the


annotations that are added by each human annotator, you must assign at least two
human annotators to different annotation sets. You must also specify that some
percentage of documents overlap between the sets.
Note: In a real project, you would create as many annotation sets as needed, based
on the number of human annotators working in the project. In this lab, you will
create just one annotation set that you will also annotate.
For more information about annotation sets, see Creating and assigning annotation
sets.

Watson Application Developer Workshop 7

16. Within your project, click Documents in the banner or the navigation
menu

17. Click Create Annotation Sets. The Create Annotation Sets window opens.
By default, this window shows the base set (containing all documents), as
well as fields where you can specify the information for a new annotation
set

18. Select your name in the Annotator list and provide a name for the set.
Notice that you could add more sets (and a human annotator for each
one), which is a more realistic situation in a business environment. In the
case of more than one set, the Overlap field specifies the percentage of
documents in the base set to be included in all of the new sets, so they
can be annotated by all annotators and you can compare the results.
Since you only have one set, the overlap has no effect

8 Lab02 – Building a Machine-learning Annotator with Watson Knowledge Studio



19. Click Generate

The new annotation set is created and now appear in the Annotation Sets tab of the
Documents page.

Adding a dictionary

Dictionaries are used in WKS for pre-annotating text when creating a machine-
learning annotator. You will now learn how to add a dictionary to a project in
Watson Knowledge Studio. For more information about dictionaries, see Adding

Watson Application Developer Workshop 9

dictionaries to a project.

20. Download the dictionary-items-organization.csv file to your computer.


This file contains dictionary terms in CSV format, suitable for importing
into a Watson Knowledge Studio dictionary

21. Within your project, click Dictionaries in the banner or the navigation
menu

22. Click the Add icon to add a dictionary

Note: Do not click the Import icon, which is used to import a dictionary that you
want to use as-is. For the lab, you will create a new editable dictionary and then
import terms into it.

23. In the Name field, type “Test dictionary”. Click Save to create the (empty)
dictionary. The new dictionary is created and automatically opened for
editing

24. In the dictionary pane, click Import. In the Import Dictionary Entries
window, select the dictionary-items-organization.csv file from your

10 Lab02 – Building a Machine-learning Annotator with Watson Knowledge Studio



computer and then click Import. 24 terms in the file are imported into the
dictionary. Each term represents an organization

25. Click Add Entry to create a new term. An editable row is added at the top

26. In the Surface Forms column, type “IBM” and “International Business
Machines Corporation” on separate lines (when you begin to type a new
surface form, a space is added below for an additional surface form).
Leave the radio button next to IBM selected, indicating that this surface
form is the lemma. In the Part of Speech column, select Noun. Click Save

After you create a dictionary, you can use it to speed up human annotation tasks by
pre-annotating the documents.

Pre-annotating with a dictionary-based annotator

You will now learn how to use a dictionary-based annotator to pre-annotate


documents in Watson Knowledge Studio. For more information about pre-
annotation with dictionary-based annotators, see Pre-annotating documents with
the Dictionary pre-annotator.

27. Within your project, click Annotator Component in the banner or the
navigation menu. You can see different ways to pre-annotate documents

Watson Application Developer Workshop 11

28. Under the description of the Dictionary Pre-annotator type, click Create
this type of pre-annotator. The Dictionary Mapping window opens.

29. The list of entity types you previously imported when creating the type
system appears. You now have to associate each dictionary that you want
the dictionary pre-annotator to use, with the entity type that matches the
type of the dictionary terms. You must map at least one dictionary before
you can run the pre-annotator. Map the ORGANIZATION entity type to
the “Test dictionary” dictionary you created previously: Click Edit for the
ORGANIZATION entity type name. Choose the dictionary from the list

30. Click the plus sign beside the dictionary name to add the mapping, and
then click Save

31. Click Create and then select Create & Run from the drop-down menu

12 Lab02 – Building a Machine-learning Annotator with Watson Knowledge Studio



32. On the Run Annotator page, click the check boxes to select the document
set that you created earlier in the lab (not including the base set)

33. Click Run

The documents in the selected set are pre-annotated using the dictionary annotator
you created. The annotator component is added to the Annotator Component page;
you could later use the same annotator to pre-annotate additional document sets
by clicking Run.

Creating an annotation task

In this section, you will learn how to use annotation tasks to track the work of
human annotators in Watson Knowledge Studio. For more information about

Watson Application Developer Workshop 13

annotation tasks, see Creating an annotation task.

34. Within your project, click Human Annotation in the banner or the
navigation menu. On the Human Annotation page, click Add Task

35. Specify the details for the task: In the Title field, type “Test”. In the
Deadline field, select a date in the future

36. Click Create

37. In the Add Annotation Sets to Task window, click the check boxes to select
the document set you created previously. This specifies that the
document set must be annotated by the assigned human annotators as
part of this task. Remember that for this lab you only have one human
annotator and the corresponding annotation set. In a real scenario, you
will have multiple annotation sets assigned to different human
annotators in your project

38. Click Create Task

14 Lab02 – Building a Machine-learning Annotator with Watson Knowledge Studio



39. Click the Test task to open it. You can use this view to view the progress
of human annotation work, calculate the inter-annotator agreement
scores, and view overlapping documents to adjudicate annotation
conflicts

Annotating documents

In this section, you will learn how to use the Ground Truth Editor to annotate
documents in Watson Knowledge Studio. For more information about human
annotation, see Annotation with the Ground Truth Editor.

40. In the Test task you created in the previous section, click Annotate next
to the Annotation Set 1 annotation set. The Ground Truth Editor opens,
showing you a preview of each document in the document set. The
Ground Truth Editor opens in a new browser tab, showing you a preview
of each document in the document set

Watson Application Developer Workshop 15

41. Scroll to the “Technology - gmanews.tv” document and click to open it for
annotation. Note that the term “IBM” has already been annotated with
the ORGANIZATION entity type; this annotation was added by the
previous dictionary pre-annotator process. This pre-annotation is correct,
so it does not need to be modified

16 Lab02 – Building a Machine-learning Annotator with Watson Knowledge Studio



42. You will now annotate a mention. Click the Mentions icon to begin
annotating mentions. In the document body, select the text “Thomas
Watson”

43. In the list of entity types, click PERSON. The entity type PERSON is applied
to the selected mention

44. Click the Relations icon to begin annotating relations

Watson Application Developer Workshop 17

45. Select the “Thomas Watson” and “IBM” mentions (in that order). To
select a mention, click the entity-type label above the text

46. In the list of relation types, click founderOf. The two mentions are
connected with a founderOf relationship

47. Click the Completed option from the menu, and then click the Save icon
to confirm, and then click Close

48. In the list of documents click Submit All to submit the documents for
approval. Once confirmed, you can see that the status of all documents is
Completed

Note: In a real project, you would create many more annotations and complete all
of the documents in the set before submitting.

49. Close the Ground Truth Editor

50. Back in the Human Annotation Tasks window, click the Refresh button.
You can see now that the Annotation Set 1 is in Submitted status

51. Mark the check box near to Annotation Set 1; you will see that an Accept
and Cancel buttons appear. Click Accept. You have now promoted the

18 Lab02 – Building a Machine-learning Annotator with Watson Knowledge Studio



document set to ground truth

Note: In a real situation you will have several annotation sets reviewed by different
human annotators. You will have to compare their work to determine whether
different human annotators are annotating overlapping documents consistently. In
that situation, Watson Knowledge Studio calculates inter-annotator agreement
(IAA) scores by examining all overlapping documents in all document sets in the
task, regardless of the status of the document sets. The IAA scores show how
different human annotators annotated mentions, relations, and coreference
chains. It is a good idea to check IAA scores periodically and verify that human
annotators are consistent with each other. For example, a human annotator could
have defined the relation between IBM and Thomas Watson as founderOf and
another one as employedBy. The IAA scores will reflect this situation that you will
have to analyse and discuss with the annotators to adjudicate conflicts. This is
something you can do in the annotation task. For this simple example in the lab,
you are the single annotator and a minimum set of human annotations have been
done, so no conflicts are present and the annotation set status is completed (the
status should be In Conflict if any overlapping is detected when you select and
accept the document sets).

Creating and deploying a machine-learning annotator

After you have promoted the documents to ground truth, you can use them to train
the machine-learning annotator. When you create a machine-learning annotator,
you select the document sets that you want to use to train it. You also specify the
percentage of documents that are to be used as training data, test data, and blind
data. Only documents that became ground truth through approval or adjudication
can be used to train the machine-learning annotator.

Watson Application Developer Workshop 19

52. On the Annotator Component page, click Create Annotator. On the


Machine Learning Annotator pane, click Create this type of annotator

53. Select the document sets that you want to use for creating a machine-
learning annotator. Click the check mark next to Annotation Set1. Use the
default values for creating your testing, training, and blind data. Then,
click Next. Accept to reuse the current dictionary mapping in the next
window and click Train & Evaluate

20 Lab02 – Building a Machine-learning Annotator with Watson Knowledge Studio




Note: Training might take more than ten minutes, or even hours, depending on the
number of human annotations and the total number of words across documents.

54. You are now back to the Annotator Component window where you can
see the training progress of your annotator

55. After the machine-learning annotator is trained, you can export it or you
can view detailed information on its performance by clicking Details. On
the Model Settings tab, you have access to the Train / Test / Blind sets
where you can view the documents that human annotators worked on.
You can click View Decoding Results to see the annotations that the
trained machine-learning annotator created on that same set of
documents

56. On the Statistics tab, you can view details about the precision, recall and
F1 scores for the machine-learning annotator. You can view these scores
for mentions, relations, and coreference chains by using the radio
buttons. You can analyse performance by viewing a summary of statistics
for entity types, relation types, and coreference chains. You can also
analyse statistics that are presented in a confusion matrix. The confusion
matrix helps you compare the annotations that were added by the
machine-learning annotator to the annotations in the ground truth.

Watson Application Developer Workshop 21

Note: In this tutorial, you annotated documents with only a single dictionary for
organizations. Therefore, the scores you see are 0 or N/A for most entity types. The
numbers are low, but that is expected, because you did not do any human
annotation or correction.

57. On the Versions tab, you can take a snapshot of the annotator and the
resources that were used to create it (except for dictionaries and
annotation tasks). For example, you might want to take a snapshot before
you change the annotator. If the statistics are poorer the next time you
run it, you can promote the older version and delete the version that
returned poorer results. Also, if you want to make your annotator
available to other Watson applications, you must create at least one
version of the annotator. This allows you to deploy one version, while you
continue to improve the current version. The option to deploy does not
appear until you create at least one version

58. On the Versions tab, click Take Snapshot. Provide a description of the
snapshot

22 Lab02 – Building a Machine-learning Annotator with Watson Knowledge Studio



59. Once the snapshot has been created, click Deploy in the Action section in
the same line that the snapshot. Select AlchemyLanguage as the service
to deploy the model to and click Next

60. In the Deploy Model window, introduce your AlchemyLanguage API key.
This is the same one you used in the previous AlchemyLanguage lab (you
can get it from your IBM Bluemix dashboard, where you should have your
AlchemyLanguage service). Click Deploy

61. A confirmation window indicating the deployment to the


AlchemyLanguage service has started will appear. In that window, you
will have your model Id, the one you should use when invoking the

Watson Application Developer Workshop 23

AlchemyLanguage methods with your annotator. Copy this model Id for


use it later. Click Ok

62. You can check the status of the deployment in the Action section close to
the snapshot. Depending on the model, the deployment can take some
time to complete. Once you see that the status is available, you are model
is ready to be used by the AlchemyLanguage methods

Using the annotator in AlchemyLanguage (optional)

After you train a machine-learning annotator, you can use it to pre-annotate new
documents that you add to the corpus and you can make it available to other
Watson applications, like AlchemyLanguage, Watson Discovery service or Watson
Explorer.
See Using the machine-learning model to learn how to deploy your annotators to
these IBM Watson applications.
Now that you have deployed your model, you can test it with AlchemyLanguage.
Verify first that the deployment has finished. We are now going to extract named
entities with the AlchemyLanguage GetRankedNamedEntities method.

63. Open a Postman session. Create a new POST HTTP request with the
following parameters:

24 Lab02 – Building a Machine-learning Annotator with Watson Knowledge Studio



Method POST

Endpoint https://gateway-
a.watsonplatform.net/calls/text/TextGetRankedNamedEntities

apikey <your api Key>

outputMode xml

emotion 1

sentiment 1

knowledgeGraph 1

model <your model Id>

text NCR, which counts later IBM Thomas Watson as one of its early
employees, said its products and services account for more than
$400 billion in annual commerce and 23 billion consumer self-
service transactions

64. Verify that in the Body section of the request, the option x-www-form-
urlencoded is selected

65. Click Send to make the request. You should get an XML output similar to
the following one

Watson Application Developer Workshop 25

66. IBM is recognized as organization and Watson as a person, the entities


you trained your annotator for.

67. If you try the same request, but now removing the model parameter, you
should realize that the entities extracted are much more generic

68. You can try the same exercise now with the AlchemyLanguage
GetTypeRelations method. You will see that Thomas Watson is identified
as founderOf IBM. Try again removing the model parameter and compare
the results

End of Lab

26 Lab02 – Building a Machine-learning Annotator with Watson Knowledge Studio

You might also like