You are on page 1of 26

Artificial Intelligence (AI)

Artificial intelligence, commonly abbreviated as AI, also known as machine intelligence, is the practice of developing algorithms that make machines (usually computers) able to make seemingly intelligent decisions, or act as if possessing intelligence of a human scale.

Overview
The accepted definition of artificial intelligence, put forth by John McCarthy in 1955: "making a machine behave in ways that would be called intelligent if a human were so behaving." Since that time several distinct types of artificial intelligence have been elucidated.

Strong artificial intelligence


Strong artificial intelligence deals with the creation of some form of computer-based artificial intelligence that can truly reason and solve problems; a strong form of AI is said to be sentient, or self-aware. In theory, there are two types of strong AI: Human-like AI, in which the computer program thinks and reasons much like a human mind. Non-human-like AI, in which the computer program develops a totally non-human sentience, and a non-human way of thinking and reasoning.

Weak artificial intelligence


Weak artificial intelligence deals with the creation of some form of computer-based artificial intelligence that cannot truly reason and solve problems; such a machine would, in some ways, act as if it were intelligent, but it would not possess true intelligence or sentience. To date, much of the work in this field has been done with computer simulations of intelligence based on predefined sets of rules. Very little progress has been made in strong AI. Depending on how one defines one's goals, a moderate amount of progress has been made in weak AI.

Development of AI theory
Much of the (original) focus of artificial intelligence research draws from an experimental approach to psychology, and emphasizes what may be called linguistic intelligence (best exemplified in the Turing test). Approaches to artificial intelligence that do not focus on linguistic intelligence include robotics and collective intelligence approaches, which focus on active manipulation of an environment, or consensus decision making, and draw from biology and political science when seeking models of how "intelligent" behavior is organized. Artificial intelligence theory also draws from animal studies, in particular with insects, which are easier to emulate as robots (see artificial life), as well as animals with more complex cognition. AI researchers argue that animals, which are simpler than humans, ought to be considerably easier to mimic. But satisfactory computational models for animal intelligence are not available. Seminal papers advancing the concept of machine intelligence include A Logical Calculus of the Ideas Immanent in Nervous Activity (1943), by Warren McCulloch and Walter Pitts, and On Computing Machinery and Intelligence (1950), by Alan Turing, and Man-Computer Symbiosis by J.C.R. Licklider. See cybernetics and Turing test for further discussion. There were also early papers which denied the possibility of machine intelligence on logical or philosophical grounds such as Minds, Machines and Gdel (1961) by John Lucas . With the development of practical techniques based on AI research, advocates of AI have argued that opponents of AI have repeatedly changed their position on tasks such as computer chess or speech recognition that were previously regarded as "intelligent" in order to deny the accomplishments of AI. They point out that this moving of the goalposts effectively defines "intelligence" as "whatever humans can do that machines cannot". John von Neumann (quoted by E.T. Jaynes) anticipated this in 1948 by saying, in response to a comment at a lecture that it was impossible for a machine to think: "You insist that there is something a machine cannot do. If you will tell me precisely what it is that a machine cannot do, then I can always make a machine which will do just that!". Von Neumann was presumably alluding to the Church-Turing thesis which states that any effective procedure can be simulated by a (generalized) computer. 1969 McCarthy

and Hayes started the discussion about the frame problem with their essay, "Some Philosophical Problems from the Standpoint of Artificial Intelligence".

Experimental AI research
Artificial intelligence began as an experimental field in the 1950s with such pioneers as Allen Newell and Herbert Simon, who founded the first artificial intelligence laboratory at Carnegie-Mellon University, and McCarthy and Minsky, who founded the MIT AI Lab in 1959. They all attended the aforementioned Dartmouth College summer AI conference in 1956, which was organized by McCarthy, Minsky, and Nathan Rochester of IBM. Historically, there are two broad styles of AI research - the "neats" and "scruffies". "Neat", classical or symbolic AI research, in general, involves symbolic manipulation of abstract concepts, and is the methodology used in most expert systems. Parallel to this are the "scruffy", or "connectionist", approaches, of which neural networks are the bestknown example, which try to "evolve" intelligence through building systems and then improving them through some automatic process rather than systematically designing something to complete the task. Both approaches appeared very early in AI history. Throughout the 1960s and 1970s scruffy approaches were pushed to the background, but interest was regained in the 1980s when the limitations of the "neat" approaches of the time became clearer. However, it has become clear that contemporary methods using both broad approaches have severe limitations. Artificial intelligence research was very heavily funded in the 1980s by the Defense Advanced Research Projects Agency in the United States and by the Fifth Generation Project in Japan. The failure of the work funded at the time to produce immediate results, despite the grandiose promises of some AI practitioners, led to correspondingly large cutbacks in funding by government agencies in the late 1980s, leading to a general downturn in activity in the field known as AI winter. Over the following decade, many AI researchers moved into related areas with more modest goals such as machine learning, robotics, and computer vision, though research in pure AI continued at reduced levels.

Practical applications of AI techniques


Whilst progress towards the ultimate goal of human-like intelligence has been slow, many spinoffs have come in the process. Notable examples include the languages LISP and Prolog, which were invented for AI research but are now used for non-AI tasks. Hacker culture first sprang from AI laboratories, in particular the MIT AI Lab, home at various times to such luminaries as McCarthy, Minsky, Seymour Papert (who developed Logo there), Terry Winograd (who abandoned AI after developing SHRDLU).

Philosophical criticisms of AI
Several philosophers, notably John Searle and Hubert Dreyfus, have argued on philosophical grounds against the feasibility of building human-like consciousness or intelligence in a disembodied machine. Searle is most known for his Chinese room argument, which claims to demonstrate that even a machine that passed the Turing test would not necessarily be conscious in the human sense. Dreyfus, in his book Why Computers Can't Think, has argued that consciousness cannot be captured by rule- or logic-based systems or by systems that are not attached to a physical body, but leaves open the possibility that a robotic system using neural networks or similar mechanisms might achieve artificial intelligence.

Hypothetical consequences of AI
Some observers foresee the development of systems that are far more intelligent and complex than anything currently known. One name for these hypothetical systems is artilects. With the introduction of artificially intelligent non-deterministic systems, many ethical issues will arise. Many of these issues have never been encountered by humanity. Over time, debates have tended to focus less and less on "possibility" and more on "desirability", as emphasized in the "Cosmist" (versus "Terran") debates initiated by Hugo De Garis and Kevin Warwick. A Cosmist, according to de Garis, is actually seeking to build more intelligent successors to the human species. The emergence of this debate suggests that desirability questions may also have influenced some of the early thinkers "against".

Artificial Intelligence Voice Recognition


AI continues to become increasingly a part of our daily lives. Voice recognition is a key enabler of human-machine interface. Although we often talk (or curse) at our computers, the thought of them talking back to us is somewhat disturbing. The ultimate computer with artificial intelligence was the HAL computer in the science fiction classic, Space Odyssey. HAL not only could understand and respond to human speech, but also determined what was best for the future of mankind, even if that meant sacrificing a few people along the way. While the computers of today do not have the capabilities of HAL, there is a new era of computers incorporating the use of artificial intelligence. Humans can now communicate with computers through common everyday speech. The start of robotic sounding voices has evolved to a highly sophisticated voice technology system that had sales of over $1.2 billion in 2004. Voice technology systems, powered by artificial intelligence, are no longer just an emerging technology, but are being used by companies from BWM to Dell to Frigidaire to Wal-Mart. Want a dog, but dont want to feed or walk it? Poo Chi is an interactive dog made by Tiger Electronics. The dog responds to commands through voice recognition. The company says that Poo Chi will grow and mature as you train him. Like a real dog, this one can learn tricks like lie down, sit and shake. However, unlike a real dog, these mechanical pets can also learn to sing songs. And, of course, they dont need to be taken outside, fed, or taken to the vet. Want to talk to your car? Ford Motor Company has developed an advanced voice technology system so you can communicate with your car. New vehicles can be equipped with a conversational speech interface technology. The system uses a text-tospeech technology that sounds like you are talking to another person, and not a robot. What can you talk about with your car? Want to play music? The system asks what type of music and then will list what artists are available. Need to make a call? Tell your care to call Steven Smith and if there is more than one Steven Smith listed, the car will even ask you which one should be called. Also controlled through voice recognition are the navigation system, climate control, and retractable roof and personalization preferences. Fords conversational speech technology currently has a vocabulary of over 50,000 words and unlike kids and pets; it speaks only when spoken to! Speech recognition technology is also on the rise in the field of customer service call

centers. Instead of pushing "1" for service or "2" for complaints; you now talk directly with the computer and learn your bank balance, when your last car payment was received and receiving answers to a wide variety of frequent customer service questions. Some of the companies now using speech recognition technology to answer customers questions are: Bank of America, Sprint, United Airlines, Sony, Sears, Ticketmaster and Nike. The evolving hand of voice recognition technology is a blessing for those with disabilities. Various recognition software programs take the spoken word and translate it to the written word. Two such programs are IBM Via Voice and Dragon Naturally Speaking, as artificial intelligence keeps expanding its scope, voice recognition programs will become more prevalent throughout our everyday lives. Already standard in Microsoft's Widows XP is a voice recognition program that allows users to speak and Microsoft Word will type for you. It is not perfect, but it is quite amazing to experience. Many users do not even know the program is right there on their PC's. As users do begin to take advantage of the technology and demand grows for better software, typing will become a thing of the past. Manual interfaces of all kinds will become a thing of the past. Viewers of Star Trek the next generation and Voyager are well aware of voice command for everything from making soup to controlling the Hollow Deck. This method of interaction is well on its way to becoming reality. But even this is a temporary stage in the evolution of man-machine interaction. One day, there will be a symbiosis between the two. Implanted chips will eliminate even the need for voice command. Soon, even HAL may be a distant memory. A representation of the good old days when you actually had to talk to get anything done.

Voice Recognition Definitions: Speech recognition (in many contexts also known as automatic speech recognition, computer speech recognition or erroneously as voice recognition) is the process of converting a speech signal to a sequence of words, by means of an algorithm implemented as a computer program. Speech recognition applications that have emerged over the last few years include voice dialing (e.g., "Call home"), call routing (e.g., "I would like to make a collect call"), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g., a radiology report) Voice recognition or speaker recognition is a related process that attempts to identify the person speaking, as opposed to what is being said. Speaker recognition or voice recognition is the task of recognizing people from their voices. Such systems extract features from speech, model them and use them to recognize the person from his/her voice. Voice or speech recognition is the ability of a machine or program to receive and interpret dictation, or to understand and carry out spoken commands.

TERMINOLOGY
The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (phonemes) which make up words. Statistical models of phonemes and words are used to recognise either discrete or continuous speech input. The production of quality statistical models requires extensive training samples (corpora) and vast quantities of speech have been collected and continue to be collected for this purpose. There are a number of significant problems to be overcome if speech is to become a commonly used medium for dealing with a computer. The first of these is the ability to recognise continuous, or spontaneous, speech rather than speech which is deliberately delivered by the speaker as a series of discrete words separated by a pause. The next is to recognise any speaker, avoiding the need to train the system to recognise the speech of a particular individual. There is also the serious problem of the noise which can interfere with recognition, either from the environment in which the speaker uses the system or through noise introduced by the transmission medium, the telephone line, for example. Noise reduction, signal enhancement and key word spotting can be used to allow accurate and robust recognition in noisy environments or over telecommunications networks. Finally, there are the problems of dealing with regional accents, dialects, language spoken by a foreigner, and language which is spoken ungrammatically, which is probably most of it.

Phoneme

1) The smallest unit of sound that is unique.

Example:

The words seat, meat, beat, cheat are different words since the s m b c initial sound is a separate phoneme in English.

About 40-50 phonemes in English Abnormal: AE B N AO R M A L

2) The simplest sound is pure tone that has a sine waveform. Pure tone are rare.

3) Most sounds, including speech phonemes, are complex waves, having a dominant or primary frequency called fundamental frequency overlaid with secondary frequencies.

Fundamental Frequency
Fundamental frequency for speech is the rate at which the vocal cords flap against each other when producing a voiced phoneme.

Examples of Complex Waves for Phoneme

Co-articulation

Co-articulation effects: inter-phoneme influences Because of co-articulation effects, a specific utterance or instance of a phoneme is called a phone

The Three Approaches for Voice recognition Includes:-

Template matching Acoustic-phonetic recognition (e.g., FastTalk) Stochastic processing

TEMPLATE MATCHING
Template matching is the simplest technique and has the highest accuracy when used properly, but it also suffers from the most limitations. As with any approach to voice recognition, the first step is for the user to speak a word or phrase into a microphone. The electrical signal from the microphone is digitized by an "analog-to-digital (A/D) converter", and is stored in memory. To determine the "meaning" of this voice input, the computer attempts to match the input with a digitized voice sample, or template, that has a known meaning. This technique is a close analogy to the traditional command inputs from a keyboard. The program contains the input template, and attempts to match this template with the actual input using a simple conditional statement. Since each person's voice is different, the program cannot possibly contain a template for each potential user, so the program must first be "trained" with a new user's voice input before that user's voice can be recognized by the program. During a training session, the program displays a printed word or phrase, and the user speaks that word or phrase several times into a microphone. The program computes a statistical average of the multiple samples of the same word and stores the averaged sample as a template in a program data structure. With this approach to voice recognition, the program has a "vocabulary" that is limited to the words or phrases used in the training session, and its user base is also limited to those users who have trained the program. This type of system is known as "speaker dependent." It can have vocabularies on the order of a few hundred words and short phrases, and recognition accuracy can be about 98 percent.

So, we can say that in Template Matching Each word or phrase is stored as a separate template. Idea: Select the template that best matches the spoken input (frame-by-frame comparison) and the dissimilarity is within a predetermined threshold. Template matching is performed at the word level. Temporal alignment is used to ensure that fast or slow utterance of the same word is not identified as different words. Dynamic Time Warping is used for temporal alignment.

Dynamic Time Warping

Robust Template
In early systems, one template for one example (token) To handle variability, many templates of the same word are stored. Robust template is created from more than one token of the same word using mathematical averages and statistical methods. Advantage

Perform well with small vocabularies of phonetically distinct words. Midsize vocabularies in the range of 1000-10000 words are possible if the number of vocabulary choices at a one time is kept minimal.

Disadvantage Must have at least one template for each word in the application vocabulary. Not good with large vocabularies containing words that have similar sounds (confusable words., e.g., to and two)

ACOUSTIC PHONETIC RECOGNITION


Speech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. The recognized words can be the final results, as for applications such as commands & control, data entry, and document preparation. They can also serve as the input to further linguistic processing in order to achieve speech understanding. Below figure shows the major components of a typical speech recognition system. The digitized speech signal is first transformed into a set of useful measurements or features at a fixed rate, typically once every 10--20 msec. These measurements are then used to search for the most likely word candidate, making use of constraints imposed by the acoustic, lexical, and language models. Throughout this process, training data are used to determine the values of the model parameters.

Figure: Components of a typical speech recognition system. Speech recognition systems attempt to model the sources of variability described above in several ways. At the level of signal representation, researchers have developed representations that emphasize perceptually important speaker-independent features of the signal, and de-emphasize speaker-dependent characteristics. At the acoustic phonetic level, speaker variability is typically modeled using statistical techniques applied to large amounts of data. Speaker adaptation algorithms have also been developed that adapt speaker-independent acoustic models to those of the current speaker during system use. Effects of linguistic context at the acoustic phonetic level are typically handled by training separate models for phonemes in different contexts; this is called context dependent acoustic modeling.

So, we can say that Acoustic-Phonetic Recognition

Store only representations of phonemes for a language

Three steps

I. Feature extraction II. Segmentation and labeling:

Segmentation determine when one phoneme ends and another


begins. Labeling identify

Phonemes- Output a set of phoneme hypotheses that can be


represented by a phoneme lattice, a decision tree, etc

III. Word-level recognition:

Search for words matching phoneme hypotheses. The word best matching a sequence of hypotheses is identified.

Stochastic Processing Use Hidden Markov Model (HMM) to store the model of each of the items that will be recognized. Items: phonemes or subwords. Each state of the HMM has statistics for a segment of the word. The statistics describe
3-state HMM of a triphone obtain from training

the parameter values and variation that were found in samples of the word.

A recognition system may have numerous HMMs or may combine them into one network of states and transitions.

Stochastic processing using HMM is accurate and flexible.

Variability
CO-ARTICULATION INTER-SPEAKER DIFFERENCES INTRA-SPEAKER INCONSISTENCIES ROBUSTNESS OF A SYSTEM: HOW

SYSTEM PERFORMS UNDER VARIABILITY

Co- articulation:
Co- articulation in phonetic refers to:The assimilation of the place of articulation of one speech sound to that of adjacent speech sound eg. While the sound/n/ of English has an alveolar place of articulation (Alveolar constants are articulated with tongue against) And tenth is pronounced with dental place of articulation because (dental is constant that is articulated with upper, lower or with both teeth).

Inter speaker differences:Speech variation is crucial issue for speaker recognition and identification. Understanding acoustic parameters and their variation in speech is therefore significant for evaluation of effective parameters. Each person has his own features of speech that cant be superseded by others even though sometimes successful imitation or disguised voice can confuse two speakers.

Intra speaker variation:


It exists universally which results in greatest obstacle for speaker recognition. Variation occurs due to: 1) 2) 3) 4) change of status of speaker speech environment speakers health and environment Intentional imitation or disguise etc.

Speech system architecture


SIGNAL PROCESSING DECODING UNDERSTANDING INTERACTION ACTION

A generic speech system

spee ch

Signal processi ng Decoder Parser

Dialog manager Domain Domain Domain agent agent agent

Languag e Generat or Speech synthesi zer

Post parser

spee ch

displ ay

effect or

1) Signal processing:
It is the analysis,interpretation and manipulation of signals.Signals of interest include sound,image,biological signals such as radar signals.Processing of such signals includes storage and reconstruction,separation of information from noise,compression(eg. Image compression) and feature extraction (eg. Speech to text conversion).

2) Decoding speech
Signal Processi ng Reduce dimensionality of signal noise conditioning

Decoder

Transcribe speech to words

Acoustic models

Language models

Corpus-base statistical models

Acoustic models: These are created by taking audio recording of speech and their
text transcriptions and using software to create statistical representation of sounds that make up each word.It is used by speech recognition engine to recognize speech.

Language models:
A stastistical language model assigns a probability to a sequence of words by means of a probability distribution.

Detail of Decoder
Speech data
Acoustic models

Transcribe*

Train

Text data

Train

Language models

Decoder: is a device that does the reverse of encoder, undoing the encoding so that original information can be retrieved.

Transcribe: To copy data from one medium to another e.g. From one source document to another or from source document to computer. It often implies a change of format or codes.

3) Understanding speech

Grammar

Ontology design, language acquisition


Extract semantic content from utterance

Parser Post parser

Introduce context and world knowledge into interpretation

Context

Domain Agents

Grounding, knowledge engineering

Grammar: is the study of rules governing the use of language.The set of rules governing a particular language is grammar of that language.Thus each language can be said to have its own distinct grammar. Grammar has two meanings: 1) inner rules themselves 2) description and study of those rules. Parsing: In computer science and linguistics,parsing is the process of analyzing sequence of tokens to determine its grammatical structure with respect to a given formal grammar.A parser is component of compiler that carries out this task.

4) Interacting with the user

Task schemas
Dialog manager

Task analysis
Guide interaction through task Map user inputs and system state into actions Interact with back-end(s) Interpret information using domain knowledge

Context

Domain n agent agent

Domai

Database

Live data (e.g. Web)

Domain expert

Knowledge engineering

Domain Agent: Is a software agent designed to facilitate the movement of information and knowledge between software which is designed for different application domains.

5)

Language generator:
Is the task of generating natural language from a machine representation system such as knowledge base or a logical form.

Here system takes decision about how to put concept into words.
6)

Speech Synthesis:

is the artificial production of human speech. A computer system used for this purpose is called speech synthesizer and can be implemented in software and hardware.A text to speech system converts normal language text into speech,oter system converts phonetic transcription into speech.

NATURAL LANGUAGE PROCESSING


Natural language processing (NLP) is a subfield of artificial intelligence and lingustics. It studies the problems of automated generation and understanding of natural human languages. Natural language generation systems convert information from computer databases into normal-sounding human language, and natural language understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate.

NATURAL LANGUAGE PROCESSING Early systems such a SHRDLU working in restricted blocks worlds with restricted vocabularies, worked extremely well, leading researchers to excessive optimism which was soon lost when the systems were extended to more realistic situations with realworld ambiguity and complexity.

Natural language understanding is sometimes referred to as an AI-complete problem, because natural language recognition seems to require extensive knowledge about the outside world and the ability to manipulate it. The definition of understanding is one of the major problems in natural language processing.

Some examples of the problems faced by natural language understanding systems:

The sentences we gave the monkeys the bananas because they were hungry and We gave the monkeys the bananas because they were overripe have the same surface grammatical structure. However, in one of them the word they refers to the monkeys, in the other it refers to the bananas: the sentence cannot be understood properly without knowledge of the properties and behaviour of monkeys and bananas.

Stering of words may be interpreted in myriad ways. For example, the string, fime flies like arrow may be interpreted in a variety of ways:

Time moves quickly just like an arrow does;

Measure the speed of flying insects like an arrow would (time them). Measure the speed of flying insects that are like arrows-i.e. Time those flies that are like arrows, A type of flying insect, time-flies, enjoy arrows (compare Fruit flies like a banana.)

The word time alone can be interpreted ass three different parts of speech, (noun in the first example,verb in 2,3,4, and adjective in 5).

English is particularly challenging in this regard because it has little inflectional morphology to distinguish between parts of speech.

English and several other languages dont specify which word an adjective applies to. For example, in the string pretty little girls school.

Does the school look little? Do the girls look little? Do the girls look pretty? Does the school look pretty?

The major tasks in NLP Text to speech Speech recognition Natural language generation Machine translation Question answering Information retrieval

Information extraction Text-procfing Automatic summarization

Some problems which make NLP difficult Speech segmentation


In most spoken languages, the sounds representing successive letters blend into each other, so the conversion of the analog signal to discrete characters can be a very difficult process. Also, in natural speech there are hardly any pauses between successive words; the location of those boundaries usually must take into account grammatical and semantical constraints, as well as the context.

Text segmentation

Some written languages like Chinese and Thai do not have signal word boundaries either, so any significant text parsing usually requires the identification of word boundaries, which is often a non-trivial task.

Word sense disambiguation

Many words have more than one meaning, we have to select the meaning which makes the most sense in context.

Syntactic ambignity

The grammar for natural languages is ambiguous, i.e. there are often multiple possible parse trees for a given sentence. Choosing the most appropriate one usually requires semantic and contextual information.

Imperfect or irregular input

Foreign or regional accents and vocal impediments in speech; typing or grammatical errors, OCR errors in texts.

You might also like