Natural Language Processing
By Ajit Singh
()
About this ebook
NLP is a large and multidisciplinary field, so this course can only provide a very general introduction. The first chapter is designed to give an overview of the main subareas and a very brief idea of the main applications and the methodologies which have been employed. The history of NLP is briefly discussed as a way of putting this into perspective. The next three chapters describe some of the main subareas in more detail. The organisation is roughly based on increased `depth' of processing, starting with relatively surface-oriented techniques and progressing to considering meaning of sentences and meaning of utterances in context. Each chapter will consider the subarea as a whole and then go on to describe one or more sample algorithms which tackle particular problems. The algorithms have been chosen because they are relatively straightforward to describe and because they illustrate a specific technique which has been shown to be useful, but the idea is to exemplify an approach, not to give a detailed survey (which would be impossible in the time available). However, other approaches will sometimes be discussed briefly. The final chapter brings the preceding material together in order to describe the state of the art in sample applications.
The objective of my book for the students is to:
1. be able to describe the architecture of and basic design for a generic NLP system `shell'.
2. be able to discuss the current and likely future performance of several NLP applications, such as machine translation and email response.
3. be able to briefly describe a fundamental technique for processing language for several subtasks, such as morphological analysis, syntactic parsing, word sense disambiguation etc.
4. understand how these techniques draw on and relate to other areas of (theoretical) computer science, such as formal language theory, formal semantics of programming languages, or theorem proving.
Ajit Singh
Ajit Singh is equally interested in fiction and non-fiction and has written many books in English, Hindi, and Urdu. He has performed in Haryana, published his prose and verse in India and Pakistan, and participated in an international online poetry symposium organized by Bazm-e-Urdu, Qatar.He lives in a village, teaches science, and comes from a farming family. His father served as a major in the Parachute Regiment of the Indian Army.Ajit plays cricket, football, volleyball, basketball, badminton, and chess. He loves harmonium and flute, sings folk songs, and also enjoys gardening in his spare time. His nickname is "Badal," which means "cloud" in English.
Read more from Ajit Singh
5 G Technologies Rating: 5 out of 5 stars5/5Numpy Simply In Depth Rating: 5 out of 5 stars5/5Internet of Things & Wireless Sensor Network Rating: 0 out of 5 stars0 ratingsThe Internet of Things: System and Applications Rating: 0 out of 5 stars0 ratingsFormal Languages And Automata Theory Rating: 0 out of 5 stars0 ratingsAgile & Scrum Methodologies Rating: 0 out of 5 stars0 ratings
Related to Natural Language Processing
Related ebooks
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention Rating: 0 out of 5 stars0 ratingsTransfer Learning for Natural Language Processing Rating: 0 out of 5 stars0 ratingsCognitive Approach to Natural Language Processing Rating: 0 out of 5 stars0 ratingsPattern Recognition and Machine Learning Rating: 0 out of 5 stars0 ratingsPython Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation Rating: 0 out of 5 stars0 ratingsPython Text Processing with NLTK 2.0 Cookbook: LITE Rating: 4 out of 5 stars4/5Mastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data Rating: 0 out of 5 stars0 ratingsProgramming Problems: Advanced Algorithms Rating: 4 out of 5 stars4/5Distributed Algorithms Rating: 3 out of 5 stars3/5Natural Language Processing with Python: Natural Language Processing Using NLTK Rating: 4 out of 5 stars4/5Convolutional Neural Networks in Python: Beginner's Guide to Convolutional Neural Networks in Python Rating: 0 out of 5 stars0 ratingsNatural Language Processing: Python and NLTK Rating: 0 out of 5 stars0 ratingsNatural Language Processing A Complete Guide - 2020 Edition Rating: 0 out of 5 stars0 ratingsReal-World Natural Language Processing: Practical applications with deep learning Rating: 0 out of 5 stars0 ratingsNatural Language Processing in Action: Understanding, analyzing, and generating text with Python Rating: 0 out of 5 stars0 ratingsPython 3 Text Processing with NLTK 3 Cookbook Rating: 4 out of 5 stars4/5Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp Rating: 4 out of 5 stars4/5Deep Learning for Vision Systems Rating: 5 out of 5 stars5/5Deep Learning with Python Rating: 5 out of 5 stars5/5Neural Networks: Neural Networks Tools and Techniques for Beginners Rating: 5 out of 5 stars5/5Deep Learning Fundamentals in Python Rating: 4 out of 5 stars4/5Machine Learning: Adaptive Behaviour Through Experience: Thinking Machines Rating: 4 out of 5 stars4/5Deep Learning with Structured Data Rating: 0 out of 5 stars0 ratingsPrinciples of Artificial Intelligence Rating: 3 out of 5 stars3/5PyTorch Recipes: A Problem-Solution Approach Rating: 0 out of 5 stars0 ratingsAdvanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch Rating: 0 out of 5 stars0 ratings
Intelligence (AI) & Semantics For You
Midjourney Mastery - The Ultimate Handbook of Prompts Rating: 5 out of 5 stars5/5101 Midjourney Prompt Secrets Rating: 3 out of 5 stars3/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5A Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®) Rating: 4 out of 5 stars4/5ChatGPT For Fiction Writing: AI for Authors Rating: 5 out of 5 stars5/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 5 out of 5 stars5/5AI for Educators: AI for Educators Rating: 5 out of 5 stars5/5ChatGPT For Dummies Rating: 0 out of 5 stars0 ratingsMastering ChatGPT: Unlock the Power of AI for Enhanced Communication and Relationships: English Rating: 0 out of 5 stars0 ratingsDancing with Qubits: How quantum computing works and how it can change the world Rating: 5 out of 5 stars5/5Artificial Intelligence: A Guide for Thinking Humans Rating: 4 out of 5 stars4/5Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures Rating: 4 out of 5 stars4/5ChatGPT Rating: 1 out of 5 stars1/5Ways of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence Rating: 4 out of 5 stars4/5The Algorithm of the Universe (A New Perspective to Cognitive AI) Rating: 5 out of 5 stars5/5What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions Rating: 5 out of 5 stars5/5THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION Rating: 5 out of 5 stars5/5The Secrets of ChatGPT Prompt Engineering for Non-Developers Rating: 5 out of 5 stars5/5TensorFlow in 1 Day: Make your own Neural Network Rating: 4 out of 5 stars4/5ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology Rating: 0 out of 5 stars0 ratingsThe Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications Rating: 0 out of 5 stars0 ratings
Reviews for Natural Language Processing
0 ratings0 reviews
Book preview
Natural Language Processing - Ajit Singh
Copyrighted Material
Natural Language Processing
Copyright © 2019 by Ajit Singh. All Rights Reserved.
No part of this publication may be reproduced, stored in a retrieval system or transmitted, in any form or by any means electronic, mechanical, photocopying, recording or otherwise without prior written permission from the author, except for the inclusion of brief quotations in a review.
For information about this title or to order other books and/or electronic media, contact the publisher:
Ajit Singh
ajit_singh24@yahoo.com
http://www.ajitvoice.in
Published by Ajit Singh at Smashwords.
Library of Congress Control Number: (N/A)
ISBN: A/F
Cover and Interior design: Ajit Singh.
Smashwords Edition, License Notes
This ebook is licensed for your personal enjoyment only. This ebook may not be re-sold or given away to other people. If you would like to share this book with another person, please purchase an additional copy for each recipient. If you’re reading this book and did not purchase it, or it was not purchased for your use only, then please return to your favorite ebook retailer and purchase your own copy. Thank you for respecting the hard work of this author.
Preface
NLP is a large and multidisciplinary field, so this book can only provide a very general introduction. The first chapter is designed to give an overview of the main subareas and a very brief idea of the main applications and the methodologies which have been employed. The history of NLP is briefly discussed as a way of putting this into perspective. The next three chapters describe some of the main subareas in more detail. The organization is based on increased `depth' of processing, starting with relatively surface-oriented techniques and progressing to considering meaning of sentences and meaning of utterances in context. Each chapter will consider the subarea as a whole and then go on to describe one or more sample algorithms which tackle particular problems. The algorithms have been chosen because they are relatively straightforward to describe and because they illustrate a specific technique which has been shown to be useful, but the idea is to exemplify an approach, not to give a detailed survey (which would be impossible in the time available). However, other approaches will sometimes be discussed briefly. The final chapter brings the preceding material together in order to describe the state of the art in sample applications.
This book aims to introduce the fundamental techniques of natural language processing, to develop an understanding of the limits of those techniques and of current research issues, and evaluate some current and potential applications.
Objectives
The objective of my book for the students is to:
Be able to describe the architecture of and basic design for a generic NLP system `shell'.
Be able to discuss the current and likely future performance of several NLP applications, such as machine translation and email response.
Be able to briefly describe a fundamental technique for processing language for several subtasks, such as morphological analysis, syntactic parsing, word sense disambiguation etc.
Understand how these techniques draw on and relate to other areas of (theoretical) computer science, such as formal language theory, formal semantics of programming languages, or theorem proving.
Key Features
Discussion of the main problems involved in language processing by means of examples taken from NLP applications with methodological distinctions and puts the applications and methodology into some historical context.
Discussion of morphology, concentrating mainly on English morphology. The concept of a lexicon in an NLP system is discussed with respect to morphological processing. Spelling rules are introduced and the use of finite state transducers to implement spelling rules is explained.
Introduces some simple statistical techniques and illustrates their use in NLP for prediction of words and part-of-speech categories. It starts with a discussion of corpora, and then introduces word prediction. Word prediction can be seen as a way of (crudely) modeling some syntactic information (i.e., word order).
NLP with Python
DIY Corpus
Chapter 1
Introduction to NLP
People communicate in many different ways: through speaking and listening, making gestures, using specialized hand signals (such as when driving or directing traffic), using sign languages for the deaf, or through various forms of text.
By text we mean words that are written or printed on a flat surface (paper, card, street signs and so on) or displayed on a screen or electronic device in order to be read by their intended recipient (or by whoever happens to be passing by).
This book will focus only on the last of these: we will be concerned with various ways in which computer systems can analyze and interpret texts, and we will assume for convenience that these texts are presented in an electronic format. This is of course quite a reasonable assumption, given the huge amount of text we can access via the World Wide Web and the increasing availability of electronic versions of newspapers, novels, textbooks and indeed subject guides. This chapter introduces some essential concepts, techniques and terminology that will be applied in the rest of the course. Some material in this chapter is a little technical but no programming is involved at this stage.
We will begin by considering texts as strings of characters which can be broken up into sub-strings, and introduce some techniques for informally describing patterns of various kinds that occur in texts. Subsequently further we will begin to motivate the analysis of texts in terms of hierarchical structures in which elements of various kinds can be embedded within each other, in a comparable way to the elements that make up an HTML web document. This section introduces some technical machinery such as: finite-state machines (FSMs), regular expressions, regular grammars and context-free grammars.
Basic concepts
Tokenized text and Pattern matching
One of the more basic operations that can be applied to a text is tokenizing: breaking up a stream of characters into words, punctuation marks, numbers and other discrete items. So for example the character string
�Dr. Watson, Mr. Sherlock Holmes�, said Stamford, introducing us.
Can be tokenized as in the following example, where each token is enclosed in single quotation marks
`' `Dr.' `Watson' `,' `Mr.' `Sherlock' `Holmes' `
' `,' `said' `Stamford' `,' `introducing' `us' `.'
At this level, words have not been classified into grammatical categories and we have very little indication of syntactic structure. Still, a fair amount of information may be obtained from relatively shallow analysis of tokenized text. For example, suppose we want to develop a procedure for finding all personal names in a given text. We know that personal names always start with capital letters, but that is not enough to distinguish them from names of countries, cities, companies, racehorses and so on, or from capitalization at the start of a sentence. Some additional ways to identify personal names include
Use of a title Dr., Mr., Mrs., Miss, Professor and so on.
A capitalized word or words followed by a comma and a number, usually below 100: this is a common way of referring to people in news reports, where the number stands for their age � for example Pierre