You are on page 1of 21

Seminar

on
Natural Language Processing
SUBMIT TED TO SUBMIT TED BY
PROF. ANU GUPTA JOGINDER
CLASS MCA-5 TH SEM
ROLL N0 -10
What is Natural Language Processing?
AI method of communicating with an intelligent systems using a natural
language such as English.
The field of NLP involves making computers to perform useful tasks with
the natural languages humans use. The input and output of an NLP
system can be
Speech
Written Text
Components of NLP
There are two components of NLP as given
Natural Language Understanding (NLU)
Understanding involves the following tasks
Mapping the given input in natural language into useful representations.
Analysing different aspects of the language.
Natural Language Generation (NLG)
It is the process of producing meaningful phrases and sentences in the form of natural language
from some internal representation. It involves
Text planning It includes retrieving the relevant content from knowledge base.
Sentence planning It includes choosing required words, forming meaningful phrases, setting tone
of the sentence.
Text Realization It is mapping sentence plan into sentence structure.
Stages of NLP
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Disclosure Integration
Pragmatic Analysis
Lexical Analysis
It involves identifying and analysing the structure of words. Lexical analysis is dividing the whole chunk of text into
paragraphs, sentences, and words.
Lexicon : A dictionary of words containing syntactic, semantic and pragmatic information
Example Lexicon :
Word Type Features
-------------------------------------------------------------------------------
a Determiner { 3 s } 3 s means third person singular

be Verb Trans : Intransitive


boy Noun {3s}
can Noun { 1s, 2s, 3s, 1p, 2p, 3p }
Verb Trans : Intransitive
orange Adjective {3s}
Noun
Syntactic Analysis (Parsing)
Top Down Parsing Parse Tree
Basic Parsing Technique
- Determining the syntactical structure of a sentence.
Inverse of sentence generation process.
Parser :Uses lexicon to determine the meaning of a word.
Output
Input String Parser representation
Structure

Lexicon
Semantic Analysis
It draws the exact meaning
or the dictionary meaning
from the text. The text is
checked for
meaningfulness. It is done
by mapping syntactic
structures and objects in
the task domain. The
semantic analyser
disregards sentence such as
hot ice-cream.
Disclosure Integration
The meaning of any sentence depends upon the meaning of
the sentence just before it. In addition, it also brings about
the meaning of immediately succeeding sentence.
Pragmatic Analysis
During this, what was
said is re-interpreted on
what it actually meant. It
involves deriving those
aspects of language
which require real world
knowledge.
OpenNLP tool working
Tokenization
The first step in this process is to split the sentence into
"tokens" - that is, words and punctuations.
This tokenizer will split words that consist of
contractions: for example, it will split "don't" into "do"
and "n't", because it is designed to pass these tokens on
to the other NLP tools, where "do" is recognized as a
verb, and "n't" as a contraction of "not", an adverb
modifying the preceding verb "do".
The "Tokenize" button in the Tools Example splits text in
the top textbox into sentences, then tokenizes each
sentence. The output, in the lower textbox, places pipe
characters between the tokens.
Part-of-speech tagging

Part-of-speech tagging is the act of assigning a part of speech (sometimes abbreviated POS)
to each word in a sentence.The POS tags consist of coded abbreviations conforming to the
scheme of the Penn Treebank
POS tagger was trained using text from the Wall Street Journal and the Brown Corpus. It is
possible to further control the POS tagger by providing it with a POS lookup list

The standard POS tagger does not use a lookup list, but the full parser does. The lookup list
consists of a text file with a word and its possible POS tags on each line. This means that if a
word in the sentence you are tagging is found in the lookup list, the POS tagger can restrict
the list of possible POS tags to those specified in the lookup list, making it more likely to
choose the correct tag.
Finding phrases ("chunking")
The OpenNLP chunker tool will group
the tokens of a sentence into larger
chunks, each chunk corresponding to a
syntactic unit such as a noun phrase or
a verb phrase. This is the next step on
the way to full parsing, but it could
also be useful in itself when looking for
units of meaning in a sentence larger
than the individual words. To perform
the chunking task, a POS tagged set of
tokens is required.
Full parsing
Producing a full parse tree is a task that builds on the
NLP algorithms , but which goes further in grouping
the chunked phrases into a tree diagram that
illustrates the structure of the sentence. The full parse
algorithms implemented by the OpenNLP library use
the sentence splitting and tokenizing steps, but
perform the POS-tagging and chunking as part of a
separate but related procedure driven by the models
in the "Parser" subfolder of the "Models" folder. The
full parse POS-tagging step uses a tag lookup list
Online Parser Outputs
Applications
Machine Translation As the world's information is online, the task of making that data accessible
becomes increasingly important.
Fighting Spam Bayesian spam filtering, a statistical technique in which the incidence of words in an
email is measured against its typical occurrence in a corpus of spam and non-spam
emails.
Information Extraction To extract value from unstructured data.

Summarization summarize the meaning of documents and information.

Question Answering answering specific questions posed by humans.


A big focus of Google's efforts in NLP has been to recognize
natural language questions, extract the meaning, and provide
the answer
Top 5 companies focussing on NLP
(1) NLP in Voice Recognition: Expect Labs was established to build tools that enable companies
to create intelligent voice-driven interfaces for any app or device
(2) NLP in Text Prediction: SwiftKey is an innovative startup that creates text prediction
technology designed to significantly boost the accuracy, fluency and speed of text entry on
mobile and computing devices
(3) NLP in Social Media Analysis: NetBase is an innovative company that uses the data from the
social web to apply social media sentiment analysis using NLP technologies.
(4) NLP in Predicting Government Legislation: FiscalNote is a technology company that offers
products for analyzing political, legal, and regulatory information using NLP and machine learning
(5) NLP in eCommerce Analysis: Klevu is a Finnish technology startup that offers a smart search
function for small and medium size web stores and shops.
References
https://en.wikipedia.org/wiki/Natural_language_processing
https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence_natural_language_
processing.htm
Quora.com
www.bing.com
Artificial Intelligence book by Pearson
Thank you

You might also like