You are on page 1of 21

Seminar

on
Natural Language Processing
S U B M I TT E D T O S U B M I TT E D BY
P R O F. A N U G U P TA JOGINDER
C L A S S M C A-
5TH SEM
ROLL N0 -10
What is Natural Language
Processing?
AI method of communicating with an intelligent systems
using a natural language such as English.
The field of NLP involves making computers to perform
useful tasks with the natural languages humans use. The
input and output of an NLP system can be
Speech
Written Text
Components of NLP
There are two components of NLP as given
Natural Language Understanding (NLU)
Understanding involves the following tasks
Mapping the given input in natural language into useful representations.
Analysing different aspects of the language.
Natural Language Generation (NLG)
It is the process of producing meaningful phrases and sentences in the form
of natural language from some internal representation. It involves
Text planning It includes retrieving the relevant content from knowledge
base.
Sentence planning It includes choosing required words, forming
meaningful phrases, setting tone of the sentence.
Text Realization It is mapping sentence plan into sentence structure.
Stages of NLP
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Disclosure Integration
Pragmatic Analysis
Lexical Analysis
It involves identifying and analysing the structure of words. Lexical analysis is dividing
the whole chunk of text into paragraphs, sentences, and words.
Lexicon : A dictionary of words containing syntactic, semantic and pragmatic information
Example Lexicon :
Word Type Features
-------------------------------------------------------------------------------
a Determiner { 3 s } 3 s means third person singular

be Verb Trans : Intransitive


boy Noun {3s}
can Noun { 1s, 2s, 3s, 1p, 2p, 3p }
Verb Trans : Intransitive
orange Adjective {3s}
Noun
Syntactic Analysis (Parsing)
Top Down Parsing

Parse Tree
Basic Parsing Technique
- Determining the syntactical structure of a sentence.
Inverse of sentence generation process.
Parser :Uses lexicon to determine the meaning of a word.
Output
Input representati
Parser on Structure
String

Lexicon
Semantic Analysis
It draws the exact meaning
or the dictionary meaning
from the text. The text is
checked for meaningfulness.
It is done by mapping
syntactic structures and
objects in the task domain.
The semantic analyser
disregards sentence such as
hot ice-cream.
Disclosure Integration
The meaning of any sentence depends upon the
meaning of the sentence just before it. In
addition, it also brings about the meaning of
immediately succeeding sentence.
Pragmatic Analysis
During this, what was
said is re-interpreted
on what it actually
meant. It involves
deriving those
aspects of language
which require real
world knowledge.
OpenNLP tool working
Tokenization
The first step in this process is to split the sentence
into "tokens" - that is, words and punctuations.
This tokenizer will split words that consist of
contractions: for example, it will split "don't" into
"do" and "n't", because it is designed to pass these
tokens on to the other NLP tools, where "do" is
recognized as a verb, and "n't" as a contraction of
"not", an adverb modifying the preceding verb "do".
The "Tokenize" button in the Tools Example splits text
in the top textbox into sentences, then tokenizes
each sentence. The output, in the lower textbox,
places pipe characters between the tokens.
Part-of-speech
tagging
Part-of-speech tagging is the act of assigning a part of speech (sometimes
abbreviated POS) to each
The word in a sentence.
POS tags consist of coded abbreviations conforming
cheme of the Penn Treebank
POS tagger was trained using text from the Wall Street Journal and the
Brown Corpus. It is possible to further control the POS tagger by providing
it with a POS lookup list
The standard POS tagger does not use a lookup list, but the full parser
does. The lookup list consists of a text file with a word and its possible
POS tags on each line. This means that if a word in the sentence you are
tagging is found in the lookup list, the POS tagger can restrict the list of
possible POS tags to those specified in the lookup list, making it more
likely to choose the correct tag.
Finding phrases ("chunking")
The OpenNLP chunker tool will group
the tokens of a sentence into larger
chunks, each chunk corresponding
to a syntactic unit such as a noun
phrase or a verb phrase. This is the
next step on the way to full parsing,
but it could also be useful in itself
when looking for units of meaning in
a sentence larger than the individual
words. To perform the chunking task,
a POS tagged set of tokens is
required.
Full parsing
Producing a full parse tree is a task that builds
on the NLP algorithms , but which goes further
in grouping the chunked phrases into a tree
diagram that illustrates the structure of the
sentence. The full parse algorithms
implemented by the OpenNLP library use the
sentence splitting and tokenizing steps, but
perform the POS-tagging and chunking as part
of a separate but related procedure driven by
the models in the "Parser" subfolder of the
"Models" folder. The full parse POS-tagging step
uses a tag lookup list
Online Parser Outputs
Applications
Machine Translation
As the world's information is online, the task of making that data
accessible becomes increasingly important.
Fighting Spam Bayesian spam filtering, a statistical technique in which the incidence
of words in an email is measured against its typical occurrence in a
corpus of spam and non-spam emails.
Information Extraction
To extract value from unstructured data.

Summarizationsummarize the meaning of documents and information.

Question Answering
answering specific questions posed by humans.
A big focus of Google's efforts in NLP has been to
recognize natural language questions, extract the
meaning, and provide the answer
Top 5 companies focussing on
NLP
(1) NLP in Voice Recognition: Expect Labs was established to build tools that
enable companies to create intelligent voice-driven interfaces for any app or
device
(2) NLP in Text Prediction: SwiftKey is an innovative startup that creates text
prediction technology designed to significantly boost the accuracy, fluency
and speed of text entry on mobile and computing devices
(3) NLP in Social Media Analysis: NetBase is an innovative company that uses
the data from the social web to apply social media sentiment analysis using
NLP technologies.
(4) NLP in Predicting Government Legislation: FiscalNote is a technology
company that offers products for analyzing political, legal, and regulatory
information using NLP and machine learning
(5) NLP in eCommerce Analysis: Klevu is a Finnish technology startup that
offers a smart search function for small and medium size web stores and
shops.
References
https://en.wikipedia.org/wiki/Natural_language_processing
https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence_nat
ural_language_processing.htm
Quora.com
www.bing.com
Artificial Intelligence book by Pearson
Thank you

You might also like