You are on page 1of 10

PROJECT DETAIL

VOICE ACTIVATED REMOTE CONTROLLER



Abstract
The Voice Activated Remote Control project will address the need oI people who do not like to
search Ior the remote control or do not have the energy to walk up to the television or any device
which makes use oI a remote control. This project will aim to create a device which can accept audio
input and will send a corresponding signal to another device atop the instrument wishing to be
controlled to perIorm the required task. We will develop an application which will run inside a
device, such as a computer or PDA, which will send the signal to a set-top device which we will
create. At the conclusion oI this project, we will have a set-top device which will receive Bluetooth
signals Irom any device which supports Bluetooth and our soItware which will enable Bluetooth-
enabled PDA`s to take voice commands and transIer them to our device. By creating a separate set-
top box, we will be able to enable the product to be compatible to Iuture devices which may integrate
Bluetooth.
II. Project Proposal Plan
II.1. Introduction

Our project is a voice-activated remote control, it entails putting together a device that will be able to
control a television set using voice commands. Instead oI the traditional inIrared remote control, we
are planning on extending it`s transmit range by adding a set oI Bluetooth receiver/transmitter to the
system. Some type oI processor, either that oI a PDA or a DSP, will be used to analyze the voice
commands given by the user. It will then send the command via its attached Bluetooth transmitter. At
the other end, by the television there will be a customized Bluetooth receiver to receive the signal.


II.2. Design requirements
II.2.1.Functional Description of the Design and its Components







Figure 1: Diagram of the project

inally it converts the R signal into compatible inIrared signal to be sent on a modiIied remote
control.
Although our project scope will only Iocus on controlling a television set, the project can be
modiIied Ior a numbers oI applications. One example is a voice activated garage door opener. The
driver will no longer have to take his/her eye oII the road to press a button to open his garage door.
Another application would be a voice-activated VCR programmer, just to name a Iew.
Currently, the group aims to develop a prototype using two laptops connected via Bluetooth. We will
develop an interIace Ior users to speak to and use a program to analyze the voice. The command will
then transIer to a module on the TV, which then converts the command to inIrared. AIter a working
prototype has been successIully developed, we will move towards a PDA. inally, iI time permits,
we will build a remote control using a DSP chip. The technical and Iinancial details oI this project
will be discussed later in the report.



Figure 2: Block Diagram of Project
II.2.2.Technical Description of the Design and its Components
The Microphone that we will use will be a miniature microphone based on one by Radio Shack,
speciIically Catalog number 33-3026. The microphone will be connected to the processor by a
standard RCA jack, (or directly to the board we`re working with), which will be connected to the
appropriate pins or inputs that are connected to the processor. The microprocessor chip will be
simply placed inside the socket or will be connected so as to make replacement easier in case the
chip is damaged. We plan to utilize a PDA, as our processor at Iirst, and iI successIul we plan to
upgrade to a standalone DSP chip and microprocessor combination. The general outlook oI the DSP
will be like something pictured below. The PDA we plan to use is a Toshiba e740 and it is also
picture below



An example oI some DSP`s that we are planning on using is TI`s TMS320C54x line. The DSP will
be programmed to do voice recognition aIter which it will output to a microcontroller which in turn
will convert the interpreted command into a Bluetooth
signal using the appropriate protocol options. The Bluetooth device is a class 2 Bluetooth transmitter,
which means its output power is approximately 5 dBm and its range is approximately 20m. This
class was chosen so as not to cause interIerence with any neighboring Bluetooth devices.
1here wlll also be a speaker connecLed Lo Lhe mlcroconLroller Lo allow communlcaLlon wlLh Lhe end
user An example of a speaker LhaL can be used ls from Lhe followlng website:
http://www.cablesonline.net/8ohm02incoms.html). Essentially, this is a standard computer speaker.
This was chosen because it is very cheap and meets the objectives oI being an eIIective
communication medium with the end user. The two wires Irom the speakers will be directly soldered
to the microcontroller socket to reduce the size oI the housing Ior the speaker, microphone, DSP, its
socket, and the Bluetooth transmitter, which will all be assembled on a perIorated board.
BeIore we implement this setup we will have an intermediate step, where we will utilize a PDA,
running a pocket PC operating system. This will serve exactly the same Iunction as the DSP
connected to a microcontroller and a Bluetooth module. The PDA will serve as a sort oI simulation
type environment Ior the actual DSP. And iI that approach works better we will leave the solution as
is. This PDA will be Bluetooth enabled and will automatically transmit to the receiver.
Cn Lhe recelvlng end anoLher 8lueLooLh LransmlLLer wlll be used however lL wlll be seL Lo recelve
Lhe slgnal from Lhe LransmlLLer 1he LransmlLLer wlll be dlrecLly connecLed Lo Lhe mlcroconLroller
whlch wlll lnLerpreL Lhe 8lueLooLh slgnal and generaLe Lhe approprlaLe lnfrared slgnal for Lhe
currenL devlce LhaL ls belng operaLed (for example Lhe Lelevlslon) 1he 8lueLooLh LransmlLLer wlll
be soldered Lo Lhe mlcroconLroller sockeL An example microcontroller is Iound on the Iollowing
website: http://www.seaIiremicros.com/products.html. This product includes cables Ior connection to
the computer and soItware to aid programming it.
The output Irom the microcontroller will be connected to an inIrared transmitter, which is basically
an inIrared LED. One can be Iound at http://www.goldmine-elec.com/pdI/241page56.pdI Ior
approximately $9. II this product is deemed too expensive, we can purchase a cheap remote control
and remove the IR transmitter. The IR transmitter will be placed in Iront oI the television or another
device and will transmit the appropriate signals to the device. The transmitter will be directly
soldered to the microcontroller socket.
II.2.3. Mathematical or Other Principles Embedded in the Project

There are two basic concepts we need to explore beIore commencing our voice activated remote
control. irst we have to understand how voice activation works. Then we also have to understand
the basic concepts and protocols oI Bluetooth.
Voice recognition is the process oI taking the spoken word as an input to a computer program. This
process is important to our product, the voice activated remote control, because it provides a Iairly
natural and intuitive way oI controlling the channels on the television while allowing the user to
remain virtually undisturbed and undistracted Irom looking Ior the remote control. Here we will
discuss the principles and concepts behind voice recognition. Although most oI the principles have
already been done Ior us, and writing a voice recognition algorithm will be beyond our means, it is
our belieI we need to understand how it works at least a slight bit in order to understand how to apply
it to our project.
What is voice recognition, and why is it useful in our project?
Voice recognition is "the technology by which sounds, words or phrases spoken by humans are
converted into electrical signals, and these signals are transIormed into coding patterns to which
meaning has been assigned" |ADA90|. While the concept could more generally be called "sound
recognition", we Iocus here on the human voice because we most oIten and most naturally use our
voices to communicate our ideas to others in our immediate surroundings. In the context oI our
product, a voice activate remote control, the user would be most comIortable in their most common
Iorm oI communication, the voice, rather than pressing buttons. The diIIiculty in using voice as an
input to a computer simulation lies in the Iundamental diIIerences between human speech and the
more traditional Iorms oI computer input. While computer programs are commonly designed to
produce a precise and well-deIined response upon receiving the proper (and equally precise) input,
the human voice and spoken words are anything but precise. Each human voice is diIIerent, and
identical words can have diIIerent meanings iI spoken with diIIerent inIlections or in diIIerent
contexts. Several approaches have been tried, with varying degrees oI success, to overcome these
diIIiculties.
ow is voice recognition performed?
The most common approaches to voice recognition can be divided into two classes: "template
matching" and "Ieature analysis". Template matching is the simplest technique and has the highest
accuracy when used properly, but it also suIIers Irom the most limitations. As with any approach to
voice recognition, the Iirst step is Ior the user to speak a word or phrase into a microphone. The
electrical signal Irom the microphone is digitized by an "analog-to-digital (A/D) converter", and is
stored in memory. To determine
the "meaning" oI this voice input, the computer attempts to match the input with a digitized voice
sample, or template, that has a known meaning. This technique is a close analogy to the traditional
command inputs Irom a keyboard. The program contains the input template, and attempts to match
this template with the actual input using a simple conditional statement.
lnce each persons volce ls dlfferenL Lhe program cannoL posslbly conLaln a LemplaLe for each
poLenLlal user so Lhe program musL flrsL be Lralned wlLh a new users volce lnpuL before LhaL
users volce can be recognlzed by Lhe program uurlng a Lralnlng sesslon Lhe program dlsplays a
prlnLed word or phrase and Lhe user speaks LhaL word or phrase several Llmes lnLo a mlcrophone
1he program compuLes a sLaLlsLlcal average of Lhe mulLlple samples of Lhe same word and sLores
Lhe averaged sample as a LemplaLe ln a program daLa sLrucLure WlLh Lhls approach Lo volce
recognlLlon Lhe program has a vocabulary LhaL ls llmlLed Lo Lhe words or phrases used ln Lhe
Lralnlng sesslon and lLs user base ls also llmlLed Lo Lhose users who have Lralned Lhe program This
type oI system is known as "speaker dependent." It can have vocabularies on the order oI a Iew
hundred words and short phrases, and recognition accuracy can be about 98 percent.
A more general form of volce recognlLlon ls avallable Lhrough feaLure analysls and Lhls Lechnlque
usually leads Lo speakerlndependenL volce recognlLlon lnsLead of Lrylng Lo flnd an exacL or near
exacL maLch beLween Lhe acLual volce lnpuL and a prevlously sLored volce LemplaLe Lhls meLhod
flrsL processes Lhe volce lnpuL uslng lourler Lransforms or llnear predlcLlve codlng (LC) Lhen
aLLempLs Lo flnd characLerlsLlc slmllarlLles beLween Lhe expecLed lnpuLs and Lhe acLual dlglLlzed
volce lnpuL 1hese slmllarlLles wlll be presenL for a wlde range of speakers and so Lhe sysLem need
noL be Lralned by each new user 1he types oI speech diIIerences that the speaker-independent
method can deal with, but which pattern matching would Iail to handle, include accents, and varying
speed oI delivery, pitch, volume, and inIlection. Speaker-independent speech recognition has proven
to be very diIIicult, with some oI the greatest hurdles being the variety oI accents and inIlections
used by speakers oI diIIerent nationalities. Recognition accuracy Ior speaker-independent systems is
somewhat less than Ior speaker-dependent systems, usually between 90 and 95 percent.
AnoLher way Lo dlfferenLlaLe beLween volce recognlLlon sysLems ls by deLermlnlng lf Lhey can
handle only dlscreLe words connecLed words or conLlnuous speech MosL volce recognlLlon
sysLems are dlscreLe word sysLems and Lhese are easlesL Lo lmplemenL lor Lhls Lype of sysLem Lhe
speaker musL pause beLween words 1hls ls flne for slLuaLlo the user is required to give only one
word responses or commands, but is very unnatural Ior multiple word inputs. In a connected word
voice recognition system, the user is allowed to speak in multiple word phrases, but he or she must
still be careIul to articulate each word and not slur the end oI one word into the beginning oI the next
word. Totally natural, continuous speech includes a great deal oI "co-articulation", where adjacent
words run together without pauses or any other apparent division between words. A speech
recognition system that handles continuous speech is the most diIIicult to implement. While
designing our project we need to consider all these aspects in deciding which type oI voice
recognition we will need.
o far as lL sLands we only need a dlscreeL word sysLem or maybe a connecLed word recognlLlon
sysLem Also we plan on havlng Lhe userrecognlLlon sofLware be good for a myrlad of users
wlLhouL havlng Lo Lraln Lhe sysLem for each dlfferenL user
What disciplines are involved in voice recognition?
The template matching method oI voice recognition is Iounded in the general principles oI digital
electronics and basic computer programming. To Iully understand the challenges oI eIIicient
speaker- independent voice recognition, the Iields oI phonetics, linguistics, and digital signal
processing will also be looked at to gain Iurther insight into designing our system.
(http://www.hitl.washington.edu/scivw/EVE/I.D.2.d.VoiceRecognition.html)
The next portion describes the basic ideas we need in order to produce the system oI bringing
inIormation processed by the DSP/PDA outputted to a Bluetooth signal and then brought to the
television. We need to examine at least the Iour core protocols, baseband, link manager, logical link
control and adaptation, and service discovery protocol.
The baseband and control layer enables the physical link between the radio Irequencies among the
blue tooth systems. It basically synchronizes all the transmissions to ensure that no data is lost or cut
oII during the Irequency hopping. Also baseband proves Ior two diIIerent kinds oI physical links,
SCO and ACL. We will be using the Asynchronous Connectionless physical link because we only
need to transmit data.
1he Llnk Manager roLocol ls responslble Lo seLLlng up Lhe llnk beLween Lhe 8lueLooLh re devices. It
manages such important aspects such as authentication and encryption. It also controls the duty
cycles and the connections states oI the a Bluetooth unit in a piconet.
The Logical Link Control and Adaptation protocol (L2CAP) adapts the upper layer protocols,
(protocols that are not the Iour core protocols), over the baseband. It is only supported Ior ACL. We
don`t believe we might need to utilize this protocol, because we

won`t be using upper layer protocols. However, we do need to look into detail about these protocols
to make sure we understand exactly how each one is used.
And Iinally, the Iourth core protocol is Service Discovery, which is very crucial to the Bluetooth
Iramework. This protocol ensures that a connection between two or more Bluetooth devices can be
established.
(http://www.bluetooth.com)
We need to delve deeper into understanding these protocols and then also understand the several
other upper level protocols in order to implement our design


Figure 5: Typical Speech recognition processes

II.2.4.Performance Expectations/Objectives
The Iirst perIormance objectives associated with our project is the response time between the
issuance oI the voice command to the execution oI the command. Obviously, the lower the response
time the better the system perIorms. The component that will be primarily associated with this
perIormance metric is the Voice Analyzer module. The module will capture and analyze the speech,
determine whether it is a command and Iinally convert the command Irom inIrared signal to Radio
requency. All oI these operations may increase the response time signiIicantly especially iI it`s not
properly designed.
AnoLher performance ob[ecLlve LhaL we can use Lo evaluaLe our sysLem ls Lhe senslLlvlLy ln
analyzlng Lhe volce volce and speech paLLern varles from one person Lo anoLher Cur sysLem has Lo
be able Lo counLer Lhls dlfference ln order Lo be consldered successful 1haL ls someone who has
an accenL should also be able Lo use Lhls sysLem wlLh a large degree of saLlsfacLlon Llkewlse Lhe
sysLem should be able Lo analyze boLh low and high pitch voice without any problems. Also, given
that there may be background noise coming Irom TV itselI the system should be able to recognize
the command in the presence oI noise.
The third perIormance objective we`d like to ensure is accuracy. When the user oI this system issue a
voice command, say Ior example 'Change to Channel 5, the device should accurately carry out the
operation. II the device increases the volume 5 out oI 10 times the above command is given then it is
not perIorming very well. Accuracy, in our opinion is the most important perIormance metric oI all.
The reason being the user oI the system would be most annoyed iI it doesn`t execute the command
correctly.
Ideally, the Iollowing perIormance requirement should be met:

The speech recognizer should recognize the user's voice properly at least 90 oI the time.
It should recognize commands Irom users that have strong accents.
It should recognize the commands despite relatively low level oI noise coming Irom the
background and the TV itselI. Note: In order to lower the relative level oI noise, the user can
speak louder or closer to the microphone.
99 oI the time when the signal is transmitted Irom the Bluetooth base, receiver should receive
the proper signal to send to the TV. That is, once the DSP/PDA has the voice interpreted
properly, the TV or device being controlled should receive the proper signal 99 oI the time.
The D/A converter must properly interpret the signal Irom Bluetooth 100 oI the time
The time between the issuance oI a command to the execution oI the command should appear
instantaneous to the user. In the worst case, the user should not have to wait more than 1
second Ior the command to be executed.

3. Design approaches
During Design VI last semester, our group did a good job oI getting ahead by choosing and Iinalizing
on a design project. It helps us a lot by giving us plenty oI time to think oI diIIerent design
approaches; we also had a lot oI time to do research on the resources that`s available to do our job.
Our project is called the Voice Activated Remote Control; a more detail description oI our project is
a touch-less remote control that will employ Bluetooth technology Ior transmission in place oI
inIrared signals. As our advisor predicted, there are a plethora oI ways to accomplish our design
goals; at this early stage

oI the senior design we already changed some oI the items that we agreed on during Design VI. We
are quite conIident that some oI these things we agreed on will eventually get changed due to the
unexpected limitations that arise.
When we Iirst Iinalize our project in Design VI, we agreed to use a commercially available DSP chip
as the baseline. We would then assemble this chip with a Bluetooth transmitting chip onto a circuit
board. The second step would be to program the DSP chip to recognize voice commands. The third
step would be to build another circuit board that will consist oI a microcontroller, Bluetooth receiver
and an inIrared transmitter. The idea is that the microcontroller will take the signal Irom the
Bluetooth receiver and convert it to InIrared.
1he beneflLs of Lhls approach are LhaL we would be learnlng a loL more abouL clrculL deslgn and
ulglLal slgnal processlng 1he rlsks however would be LhaL a sulLable u be selecLed and work
accordlng Lo speclflcaLlons lnce a u chlp doesn'L necessary provlde enough lncenLlve for a
company Lo provlde a full scale of supporL lL mlghL be dlfflculL Lo geL a problem flxed when one ls
encounLered Also someLlmes Lhe company LhaL manufacLurers of Lhese u chlp mlghL noL have
Lhe producL be readlly avallable whaL Lhls LranslaLes Lo ls LhaL we mlghL have Lo walL a couple of
monLhs beLween placlng Lhe order and acLually recelvlng Lhe u chlp AnoLher rlsk would be Lhe
programmlng of Lhe u chlp Lo recognlze volce commands 1he programmlng porLlon of Lhe
pro[ecL would be very lnLenslve and demandlng Lven wlLh an ellLe seL of professlonal
programmers Lhe volce recognlLlon porLlon would be a pro[ecL ln lLself lL mlghL noL be feaslble for
us Lo program Lhe u chlp from scraLch AnoLher area of uncerLalnLy ls LhaL nelLher one of our
group members has prlor experlence ln 8lueLooLh technology; it might take us some time to
experiment with this technology and be able to utilize it to achieve our goal. inally the inIrared
portion might present some problems, we are not sure whether we can program the DSP chip on the
receiving end to convert Bluetooth signal into inIrared signal.
Another approach would be to use laptops as sending and receiving processors; replacing the DSP
chip. With this approach, we`ll be able to save ourselves a lot oI waiting time and money. The reason
being we already has laptops available to work with; we wouldn`t need to waste any time waiting Ior
the delivery oI the DSP chip, along with not having to purchase an extra piece oI hardware. Another
beneIit oI this approach is that there are plenty oI speech recognition development kit that is
available Irom vendors such as MicrosoIt and IBM. It would just be a matter oI downloading and
installing the development kit. In this approach, we would Iocus our eIIort in programming our
application in the Windows environment to recognize commands and then sending the command via
the Bluetooth transmitter. Our laptops are equipped with a USB port and an inIrared port; this will
help us immensely on the receiving end because the Bluetooth receiver would have a USB interIace.
It would be connected to the laptop and will be responsible Ior receiving Bluetooth signals sent by
the transmitter. On the receiving end, we would write a program to take the Bluetooth signal and
convert it to inIrared signal, sent it via the inIrared port and Iinally controlling the television set. Just
like the Iirst approach, there will be uncertainties that we will be Iaced with. However, we think that
the uncertainty in this approach is signiIicantly less than the DSP approach.
1he uncerLalnLles LhaL mlghL arlse lnclude wheLher we would be able Lo flnd a speech recognlLlon
englne LhaL ls speaker lndependenL peakerln recognition is very critical to the success oI our
project; the Iinal product has to be able to be used by a stranger. Another challenge once we Iind a
speaker independent speech engine would be how perIect the program could recognize the speech. A
successIul project would be one that takes less than three tries to recognize and execute a command.
Another thing that is noteworthy is the Iact we were told by our proIessor that Bluetooth might be
replaced by Ultra Wide Band technology. II this is the case, by the time we completed our project, it
might not have any market value. However, we cannot migrate to Ultra Wide Band technology at
this point in time since the technology is not being standardized and the only vendor that has a
purchasable product is not Iollowing industry standards. The same risk that we have in the Iirst
approach in the conversion Irom Bluetooth to inIrared signals holds Ior this approach, we cannot be
sure that we will be able to successIully convert the signals.
The third approach that we have in mind is to use PDA`s running Windows CE (PocketPC) in place
oI the laptops or DSP. We would have two PDA`s in which one would be a transmitter and the other
the receiver. The sender would be responsible Ior dependent voice recognizing voice commands and
send the command inIormation via Bluetooth; just like the previous two approaches, the receiver
would be responsible Ior receiving the Bluetooth signal and converting it to inIrared signal. We think
that the PDA approach might have a higher level or risk and would cost more money to implement
(since we do not have PDA`s and the SDK would be costly). Nevertheless, this would be the
approach oI choice; the size oI the PDA`s would be the closest to a regular remote control. It would
be more challenging since programming in the PDA environment would be new to some oI us. It
would also be more interesting to work with a platIorm that is not commonplace
The biggest concern we have was to Iind a SDK Ior Windows CE. AIter more research, however, we
Iound that these SDK`s are available Irom a third party vendor such as IBM at a relatively high cost.
The problem then is whether we can aIIord the price oI the SDK which is needed to develop the
application on the Windows CE platIorm. The other concern that we have with this approach is the
quality oI the microphone that comes with the PDA; voice recognition oIten do not work well with
poor quality microphones and we might have a problem.

You might also like