You are on page 1of 21

R.SHANMUGAMM.A.,M.

Phil,PhD PROJECTENGINEER PROJECT ENGINEER CDAC GIST,R&D,PUNE

NaturalLanguageProcessing Natural Language Processing


Natural Language Processing (NLP) is an NaturalLanguageProcessing(NLP)isan emergingfieldinwhichattemptsarebeing madetomakethecomputertounderstand made to make the computer to understand humannaturallanguagesashuman understandthem. understand them

ComputationalLinguistics(CL) Computational Linguistics (CL)


To achieve the aim of NLP the ways and ToachievetheaimofNLP,thewaysand meansthathavetobeprovidedtothe computerarebeingstudied.Thisbranchis computer are being studied This branch is calledComputationalLinguistics(CL).It proposesmanymethods,formalismsand proposes many methods formalisms and algorithmsforthispurpose.Itisan interdisciplinaryfieldinvolvingLinguistics, interdisciplinary field involving Linguistics Mathematics,ComputerScience,Electronics andStatistics. and Statistics

Parsing
Morphological Parsing (Word Parsing) MorphologicalParsing(WordParsing) WordsanditsgrammaticalMeaning SyntacticParsing SentenceanditsgrammaticalMeaning

TodevelopaParser. To develop a Parser.


TamilMorphology(aboutsuffixes) Tamil Morphology (about suffixes) TamilMorphophonemicRules(Sandhi) TamilMorphotactics l h Computationalfromalism(RE,FSA) Database Programminglanguage Programming language

TamilMorphology(aboutsuffixes)
Suffixes for Noun SuffixesforNoun Plural,Case,Postposition,clitics Suffixesforverb Tense,PNG,aspectual,models,clitics

TamilMorphophonemicRules(Sandhi)
Addition Addition
+ =

D l i Deletion
+ =

Substitution
+ =

TamilMorphotactics
Noun Root+PL+Case+Postposition+clitics

Verb Root+AspAux+VoiceAux+ModAux+Tns+ PNG+Cl3+Cl4


Computationalfromalism(RE,FSA)
Regularexpression(RE) Regularexpressionisthestandardnotationfor characterizingstrings(combinationofcharacters).Itis aformulainaspeciallanguageforspecifyingsimple a formula in a special language for specifying simple classesofstrings.Formallyitisanalgebraicnotation forcharacterizingstrings.Regularexpressionwas introducedbyKleene(1956).Astringisanysequence introduced by Kleene (1956) A string is any sequence ofcharacterslikeletters,numbers,spaces,tabs,and punctuationspacewhichisalsoacharacterbecauseit hasencodingvalue.Regularexpressionneedsapattern has encoding value Regular expression needs a pattern (searchtype)tosearchstrings. avaNpuththakam patiththaaN ; /puththakam/(book)

5 2 Finite state automaton (FSA) 5.2.Finitestateautomaton(FSA) Finitestateautomatonisamathematical deviceusedtoimplementregularexpressions. device used to implement regular expressions Finitestateautomataarethetheoretical foundationofagooddealofthe foundation of a good deal of the computationalwork.Anyfinitestate automatoncanbedescribedwiththeRegular automaton can be described with the Regular Expression.

ISSUESINMORPHOLOGICALPARSING ISSUES IN MORPHOLOGICAL PARSING


1.Similaritiesbetweenthesuffixesandpartofthe p suffixes. 2.Similaritiesbetweensuffixesandpartoftheroots. 3.Similarsuffixesindifferentcategories. 4.Obliqueformsofpronouns 5 E i 5.ExistenceofGlides,Sandhi,andFillers. f Glid S dhi d Fill 6.Lackofvocabulary 7 Stems 7.Stems 8.Ambiguity 9 Exceptionals 9.Exceptionals

Similaritiesbetweenthesuffixesandpartofthesuffixes

Postpositionversuspostposition: aaka maaRaaka neeraka muulamaaka vaayilaaka vaziyaaka maaRaaka,neeraka,muulamaaka,vaayilaaka,vaziyaaka


( , , , ,)

Rankingofthesuffixes
1. 1 2.

Similaritiesbetweensuffixesandpartoftheroots.

1 maraththai (treeAcc) 1.maraththai(tree Acc) Stem:marathth Stem:mara 2.vaaththai (duckAcc) Stem:vaathth Stem:vaa Stem : vaa Compareremainingsyllables

Similarsuffixesindifferentcategories Similar suffixes in different categories


vai: vaankkivai ( kk (receiveit) ) Aspectual l Pookavai (makehimtogo) Voice Ceythuform,ceyaform

Obliqueformsofpronouns Oblique forms of pronouns


$ $word=~s/eN/naaN(pirathipeyar)/; / / (p p y )/; $word=~s/nam/naam(pirathipeyar)/; $word=~s/thaN/thaaN(pirathipeyar)/; $word=~s/tham/thaam(pirathipeyar)/; $word=~s/num/niir(pirathipeyar)/; $word=~s/um/niir(pirathipeyar)/; $word=~s/em/naam(pirathipeyar)/; $word =~s/thankkaL/thaankkaL(pirathipeyar)/; (sisusedforsubstitution) ( s is used for substitution)

MoreExistenceofglides,sandhi,filler g , ,
Theru ai otti ee (Glide) ee (nearthestreet) avaNaippaRRiththaaN(Sandhi)
maraththiNai (Filler) (treeAcc)

v y y

Lackofvocabulary suNaami()

Stems
Input p 1.maNnNnai stem deletion (soilAcc)maNnNnNn lasttwocharacter

2.eNNai (IAcc) eNN(N)lastonecharacter 3.pallai (toothAcc)pall(l) lastonecharacter 4.ceyyaamal (withoutdoing)ceYY(Y) lastone character

Ambiguity
Input:
than
neythaaN ney+th+aaNney+ neythaaN ney + th + aaN ney +

()

()

(Wovecloth He)

(itisghee)

Input: kaalai kaalai ,kaal+ai


() ( +) )

(Morning) (leg Acc)

Ambiguity
avarkaLootu iNnaiya maaNaatu ceNRaaN. avarkaLootuiNnaiyamaaNaatuceNRaaN

Contextknowledgewillplayvitalrole.

THANKYOU

You might also like