Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Natural Language Processing
Natural Language Processing
Natural Language Processing
Ebook117 pages1 hour

Natural Language Processing

Rating: 0 out of 5 stars

()

Read preview

About this ebook

NLP is a large and multidisciplinary field, so this course can only provide a very general introduction. The first chapter is designed to give an overview of the main subareas and a very brief idea of the main applications and the methodologies which have been employed. The history of NLP is briefly discussed as a way of putting this into perspective. The next three chapters describe some of the main subareas in more detail. The organisation is roughly based on increased `depth' of processing, starting with relatively surface-oriented techniques and progressing to considering meaning of sentences and meaning of utterances in context. Each chapter will consider the subarea as a whole and then go on to describe one or more sample algorithms which tackle particular problems. The algorithms have been chosen because they are relatively straightforward to describe and because they illustrate a specific technique which has been shown to be useful, but the idea is to exemplify an approach, not to give a detailed survey (which would be impossible in the time available). However, other approaches will sometimes be discussed briefly. The final chapter brings the preceding material together in order to describe the state of the art in sample applications.

The objective of my book for the students is to:

1. be able to describe the architecture of and basic design for a generic NLP system `shell'.

2. be able to discuss the current and likely future performance of several NLP applications, such as machine translation and email response.

3. be able to briefly describe a fundamental technique for processing language for several subtasks, such as morphological analysis, syntactic parsing, word sense disambiguation etc.

4. understand how these techniques draw on and relate to other areas of (theoretical) computer science, such as formal language theory, formal semantics of programming languages, or theorem proving.

LanguageEnglish
PublisherAjit Singh
Release dateMay 30, 2019
ISBN9780463643020
Natural Language Processing
Author

Ajit Singh

Ajit Singh is equally interested in fiction and non-fiction and has written many books in English, Hindi, and Urdu. He has performed in Haryana, published his prose and verse in India and Pakistan, and participated in an international online poetry symposium organized by Bazm-e-Urdu, Qatar.He lives in a village, teaches science, and comes from a farming family. His father served as a major in the Parachute Regiment of the Indian Army.Ajit plays cricket, football, volleyball, basketball, badminton, and chess. He loves harmonium and flute, sings folk songs, and also enjoys gardening in his spare time. His nickname is "Badal," which means "cloud" in English.

Read more from Ajit Singh

Related to Natural Language Processing

Related ebooks

Intelligence (AI) & Semantics For You

View More

Related articles

Reviews for Natural Language Processing

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Natural Language Processing - Ajit Singh

    Copyrighted Material

    Natural Language Processing

    Copyright © 2019 by Ajit Singh. All Rights Reserved.

    No part of this publication may be reproduced, stored in a retrieval system or transmitted, in any form or by any means electronic, mechanical, photocopying, recording or otherwise without prior written permission from the author, except for the inclusion of brief quotations in a review.

    For information about this title or to order other books and/or electronic media, contact the publisher:

    Ajit Singh

    ajit_singh24@yahoo.com

    http://www.ajitvoice.in

    Published by Ajit Singh at Smashwords.

    Library of Congress Control Number: (N/A)

    ISBN: A/F

    Cover and Interior design: Ajit Singh.

    Smashwords Edition, License Notes

    This ebook is licensed for your personal enjoyment only. This ebook may not be re-sold or given away to other people. If you would like to share this book with another person, please purchase an additional copy for each recipient. If you’re reading this book and did not purchase it, or it was not purchased for your use only, then please return to your favorite ebook retailer and purchase your own copy. Thank you for respecting the hard work of this author.

    Preface

    NLP is a large and multidisciplinary field, so this book can only provide a very general introduction. The first chapter is designed to give an overview of the main subareas and a very brief idea of the main applications and the methodologies which have been employed. The history of NLP is briefly discussed as a way of putting this into perspective. The next three chapters describe some of the main subareas in more detail. The organization is based on increased `depth' of processing, starting with relatively surface-oriented techniques and progressing to considering meaning of sentences and meaning of utterances in context. Each chapter will consider the subarea as a whole and then go on to describe one or more sample algorithms which tackle particular problems. The algorithms have been chosen because they are relatively straightforward to describe and because they illustrate a specific technique which has been shown to be useful, but the idea is to exemplify an approach, not to give a detailed survey (which would be impossible in the time available). However, other approaches will sometimes be discussed briefly. The final chapter brings the preceding material together in order to describe the state of the art in sample applications.

    This book aims to introduce the fundamental techniques of natural language processing, to develop an understanding of the limits of those techniques and of current research issues, and evaluate some current and potential applications.

    Objectives

    The objective of my book for the students is to:

    Be able to describe the architecture of and basic design for a generic NLP system `shell'.

    Be able to discuss the current and likely future performance of several NLP applications, such as machine translation and email response.

    Be able to briefly describe a fundamental technique for processing language for several subtasks, such as morphological analysis, syntactic parsing, word sense disambiguation etc.

    Understand how these techniques draw on and relate to other areas of (theoretical) computer science, such as formal language theory, formal semantics of programming languages, or theorem proving.

    Key Features

    Discussion of the main problems involved in language processing by means of examples taken from NLP applications with methodological distinctions and puts the applications and methodology into some historical context.

    Discussion of morphology, concentrating mainly on English morphology. The concept of a lexicon in an NLP system is discussed with respect to morphological processing. Spelling rules are introduced and the use of finite state transducers to implement spelling rules is explained.

    Introduces some simple statistical techniques and illustrates their use in NLP for prediction of words and part-of-speech categories. It starts with a discussion of corpora, and then introduces word prediction. Word prediction can be seen as a way of (crudely) modeling some syntactic information (i.e., word order).

    NLP with Python

    DIY Corpus

    Chapter 1

    Introduction to NLP

    People communicate in many different ways: through speaking and listening, making gestures, using specialized hand signals (such as when driving or directing traffic), using sign languages for the deaf, or through various forms of text.

    By text we mean words that are written or printed on a flat surface (paper, card, street signs and so on) or displayed on a screen or electronic device in order to be read by their intended recipient (or by whoever happens to be passing by).

    This book will focus only on the last of these: we will be concerned with various ways in which computer systems can analyze and interpret texts, and we will assume for convenience that these texts are presented in an electronic format. This is of course quite a reasonable assumption, given the huge amount of text we can access via the World Wide Web and the increasing availability of electronic versions of newspapers, novels, textbooks and indeed subject guides. This chapter introduces some essential concepts, techniques and terminology that will be applied in the rest of the course. Some material in this chapter is a little technical but no programming is involved at this stage.

    We will begin by considering texts as strings of characters which can be broken up into sub-strings, and introduce some techniques for informally describing patterns of various kinds that occur in texts. Subsequently further we will begin to motivate the analysis of texts in terms of hierarchical structures in which elements of various kinds can be embedded within each other, in a comparable way to the elements that make up an HTML web document. This section introduces some technical machinery such as: finite-state machines (FSMs), regular expressions, regular grammars and context-free grammars.

    Basic concepts

    Tokenized text and Pattern matching

    One of the more basic operations that can be applied to a text is tokenizing: breaking up a stream of characters into words, punctuation marks, numbers and other discrete items. So for example the character string

    �Dr. Watson, Mr. Sherlock Holmes�, said Stamford, introducing us.

    Can be tokenized as in the following example, where each token is enclosed in single quotation marks

    `' `Dr.' `Watson' `,' `Mr.' `Sherlock' `Holmes' `' `,' `said' `Stamford' `,' `introducing' `us' `.'

    At this level, words have not been classified into grammatical categories and we have very little indication of syntactic structure. Still, a fair amount of information may be obtained from relatively shallow analysis of tokenized text. For example, suppose we want to develop a procedure for finding all personal names in a given text. We know that personal names always start with capital letters, but that is not enough to distinguish them from names of countries, cities, companies, racehorses and so on, or from capitalization at the start of a sentence. Some additional ways to identify personal names include

    Use of a title Dr., Mr., Mrs., Miss, Professor and so on.

    A capitalized word or words followed by a comma and a number, usually below 100: this is a common way of referring to people in news reports, where the number stands for their age � for example Pierre

    Enjoying the preview?
    Page 1 of 1