You are on page 1of 21

Natural Language Processing

NLP Intro
 Language is meant for Communicating about the world.

 By studying language, we can come to understand more about the world.

 If we can succeed at building computational mode of language, we will


have a powerful tool for communicating about the world.

 We look at how we can exploit knowledge about the world, in combination


with linguistic facts, to build computational natural language systems.

 NLP problem can be divided into two tasks:


 Processing written text, using lexical, syntactic and semantic knowledge of
the language as well as the required real world information.

 Processing spoken language, using all the information needed above plus
additional knowledge about phonology as well as enough added information to
handle the further ambiguities that arise in speech.
NLP Intro
 The Problem : English sentences are incomplete descriptions of the
information that they are intended to convey.

 Some dogs are outside is incomplete – it can mean


 Some dogs are on the lawn.
 Three dogs are on the lawn.
 Moti, Hira & Buzo are on the lawn.

 The good side : Language allows speakers to be as vague or as precise as


they like. It also allows speakers to leave out things that the hearers already
know.
NLP Intro
 The Problem : The same expression means different things in different
context.
 Where’s the water? ( In a lab, it must be pure)
 Where’s the water? ( When you are thirsty, it must be potable or
drinkable)
 Where’s the water? ( Dealing with a leaky roof, it can be filthy)

 The good side : Language lets us communicate about an infinite world


using a finite number of symbols.

 The problem : There are lots of ways to say the same thing :
 Mary was born on October 11.
 Mary’s birthday is October 11.

 The good side : When you know a lot, facts imply each other. Language is
intended to be used by agents who know a lot.
Steps in NLP
1. Morphological Analysis:
• Individual words are analyzed into their components and non-word tokens
such as punctuation are separated from the words.
• Tries to extract root word from decline or inflectional form of word after
removing suffices and prefixes. Ex: getting the root “push ” from declined
from pushes, pushed, pushing, etc.
• Assign appropriate syntactic categories such as noun, verb, adjective etc. to all
words in the sentence.
2. Syntactic Analysis:
• Use the result of Morphological analysis to build a structure description of
sentence based on grammatical rules. This step is called parsing.
analyzer would reject the sentence “Boy the go the to store”
• Creating a parse tree is the first step towards understanding a sentence.
Steps in NLP
3. Semantic Analysis:
• The structures created by the syntactic analyzer are assigned meanings.
• It maps individual words in to corresponding object in the knowledge base
and combine the words with each other with semantic rules.
“Colorless green ideas sleep furiously”, will reject because of semantically
anomolous.

4. Discourse integration:
• The meaning of an individual sentence may depend on the sentences that
precede it and may influence the meanings of the sentences that follow it.
Ex: the word “it” in sentence “John wanted it”, depends up on prior
discourse context, may influence the later sentence “he always had”
Steps in NLP
5. Pragmatic Analysis:
• It refers to intended meaning of sentences used in different contexts. The
context affects the interpretation of the sentence.
Ex: John saw Mike in the garden with a cat.
• The structure representing what was said is reinterpreted to determine what
was actually meant.
EX: “Do you know what time it is?” should be interpreted as a request to
be told the time.
Syntactic Processing
• Syntactic Processing is the step in which a flat input sentence is converted
into a hierarchical structure that corresponds to the units of meaning in the
sentence.

• This process is called parsing.

• It plays an important role in natural language understanding systems for


two reasons:

– Semantic processing must operate on sentence constituent. If there is


no syntactic parsing step, then the semantics system must decide on its
own constituents.
– If parsing is done, on the other hand, it constrains the number of
constituents that semantics can consider. Syntactic parsing is
computationally less expensive than is semantic processing. Thus it can
play a significant role in reducing overall system complexity.
Syntactic Processing
– Although it is often possible to extract the meaning of a sentence
without using grammatical facts, it is not always possible to do so.
Consider the examples:

• The satellite orbited Mars


• Mars orbited the satellite

• Almost all the systems that are actually used have two main components:
– A declarative representation, called a grammar, of the syntactic facts
about the language.
– A procedure, called parser, that compares the grammar against input
sentences to produce parsed structures.
Grammars and Parsers
• In Context of NLP , Parsing implies analyzing a sentence syntactically to assign
syntactic tags (subject, verb, object etc.) to provide constituent structure (noun
phrase, verb phrase etc.) or to characterize the syntactic relations between two
words.
• Parsing technique is further divided in to rule-based parsing and statistical
parsing.
Rule-based parsing
• In rule-based parsing, syntactic structure of language is provided in form of
linguistics rules which can be coded as production rules that are similar to context
free rules.
• Production rules are defined using non-terminal and terminal symbols.
Statistical parsing
• Require large corpora and linguistic knowledge is represented as statistical
parameters or probability.
Grammars and Parsers
• Once the grammar rules are defined , a sentence is parsed using the
grammar and tree kind of structure is built, if sentence is syntactically
correct. This tree is called parse tree.
• Parsing can be done in two methods: top-down parsing and bottom up
parsing.
• Bottom-up parsing: we start with the words in the sentence and apply
grammar rules in the backward direction until a single tree is produced
whose root matches with start symbol.
• Top-down parsing: we start with start symbol and apply grammar rules in
forward direction until the terminal symbol of the parse tree corresponds to
the word in the sentence.
• The choice between these two approaches is similar to the choice between
forward and backward reasoning in other problem-solving tasks.
• The most important consideration is the branching factor. Is it greater going
backward or forward?
• Sometimes these two approaches are combined to a single method called
“bottom-up parsing with top-down filtering”.
Grammars and Parsers
• Consider the simple context free grammar for English language.

Rules Dictionary Words


<s> -> <NP><VP> <Det> -> a |an | the
<NP> -> <Det><Noun> <Noun> -> girl | apple | song
<NP> -> <Det><Adj><Noun> <Adj> -> cute | smart
<NP> -> <Adj><Noun> <verb> -> sings | eats | ate
<VP> -> <Verb>
<VP> -> <verb><NP>

• The symbol -> is used for ‘defined as’.


• Vertical bar | for alternative definitions (OR)
• The S for sentence
• NP for noun phrase
• VP for verb phrase.
• The simple sentence recognized by this grammar is: a) the boy eats an apple
b) The cute girl sings a song
A parse tree
• John ate the apple.
1. S -> NP VP S
2. VP -> V NP
3. NP -> NAME NP VP
4. NP -> ART N
NAME V NP
5. NAME -> John
6. V -> ate
John ate ART N
7. ART-> the
8. N -> apple
the apple
Parse tree
• The cute girl ate an apple
S

Top- NP VP Bottom
Down - Up
Parsing NP
Parsing
Det Adj Noun Verb det Noun
The cute Girl eat an apple
Grammars and Parsers
• First rule can be read as “ A sentence is composed of a noun phrase
followed by Verb Phrase”; Vertical bar is OR ; ε represents empty string.

• Symbols that are further expanded by rules are called non terminal
symbols.

• Symbols that correspond directly to strings that must be found in an input


sentence are called terminal symbols.

• Pure context free grammars are not effective for describing natural
languages.

• NLPs have less in common with computer language processing systems


such as compilers.
A parse tree
• John ate the apple.
1. S -> NP VP S
2. VP -> V NP
3. NP -> NAME NP VP
4. NP -> ART N
NAME V NP
5. NAME -> John
6. V -> ate
John ate ART N
7. ART-> the
8. N -> apple
the apple
Discourse and Pragmatic processing
• There are a number of important relationships that may hold between
phrases and parts of their discourse contexts, including:

• Identical entities. Consider the text:


– Bill had a red balloon.
– John wanted it.
– The word “it” should be identified as referring to red balloon. This type
of references are called anaphora.

• Parts of entities. Consider the text:


– Sue opened the book she just bought.
– The title page was torn.
– The phrase “title page” should be recognized as part of the book that
was just bought.
Discourse and pragmatic processing
 Parts of actions. Consider the text:
 John went on a business trip to New York.
 He left on an early morning flight.
 Taking a flight should be recognized as part of going on a trip.

 Entities involved in actions. Consider the text:


 My house was broken into last week.
 They took the TV and the stereo.
 The pronoun “they” should be recognized as referring to the burglars
who broke into the house.

 Elements of sets. Consider the text:


 The decals we have in stock are stars, the moon, item and a flag.
 I’ll take two moons.
 Moons means moon decals
Discourse and Pragmatic processing
 Names of individuals:
 Dave went to the movies.

 Causal chains
 There was a big snow storm yesterday.
 The schools were closed today.

 Planning sequences:
 Sally wanted a new car
 She decided to get a job.

 Implicit presuppositions:
 Did Joe fail CSE402?
Discourse and Pragmatic processing
• We focus on using following kinds of
knowledge:
– The current focus of the dialogue
– A model of each participant’s current beliefs
– The goal-driven character of dialogue
– The rules of conversation shared by all
participants.

You might also like