Professional Documents
Culture Documents
PROJECT PROPOSAL
By A. Student
Project Instructor
Martin Stanton
Executive Summary
There are large numbers of people who hate the idea of using keyboard for every
small piece of work on their computer and there are other people who can't use
the computer because of their disabilities. The main idea of the proposed project
is to perform a detailed research in speech recognition in order to create a
"speech recognition bot" that can help individuals to use their computer without
using their muscles. The speech recognition bot is intended to answer the users’
queries and perform certain tasks given by users like pressing buttons,
minimizing and maximizing windows, opening a certain website or a program. In
some cases the conversation between the users might even turn funny, with the
app giving witty remarks or telling jokes. The tasks might be limited to some
extent but the application will be engaging and useful to the users.
2
Table of Contents
3
Title
Research on speech recognition technique while building speech recognition bot
Background to Project
The concept of speech recognition started back in the early 1950’s and it was all
about voice activation and recognition. Later on during 1980’s National Science
Foundation funded the project and used the technology on people with disability
and victims of sclerosis and cerebral palsy. During this time speech recognition
programs were really complex and limited. The users either need to use small
vocabulary or it was to be used by certain individual. By the end of 1990’s there
was great improvement in speech recognition and it was applied to every
possible fields. Today most of the machines, gadgets, even vehicles contain
speech recognition facility. Popular companies like Apple, Microsoft and Google
have got their own speech assistant for their devices.
4
Aims
The project is aimed to take user’s speech as an input, reply it with an answer;
control the user’s computer based on their speech.
While building an application the project is aimed to find the flaw in speech
recognition techniques and suggest possible way to overcome it.
Objectives
The proposed project is supposed to make use of speech recognition feature in
order to create a bot that could communicate and help normal person or
disabled peoples to use their computer without typing and clicking.
3 Functional Technical Find out the best way to implement the grammar and
dictionary for the speech recognition bot.
Justification
“Having a machine to understand fluently spoken speech has driven speech
research for more than 50 years. Although ASR technology is not yet at the point
where machines understand all speech, in any acoustic environment, or by any
person, it is used on a day‐to‐day basis in a number of applications and services”
(Docsoft, 2009). There have been lots of improvements in speech recognition but
it is not good enough to understand every accent of individuals. It was
5
demonstrated at one of the YouTube videos (Branjoj, 2011) using iPhone 4S’s
speech recognition facility at Scotland. The topic is yet to be researched and
optimized and it is vast topic for research. The survey of Google(2014) has
shown that 55 percent of teens between the ages of 13 and 17 use voice search at
least twice every day. Hence, the users of speech recognition facilities are
increasing day by day and I found the topic fascinating for research purpose.
Speech recognition allows regular peoples to use the computer with ease and it
can help peoples with certain disability to use the machine even if they can’t use
their hands or fingers. Even if the proposed application is supposed to control
the PC and answer users queries, I hope the application to be helpful on
optimizing the speech recognition facility at some point and help other
researchers of this topic to get information on speech recognition technology.
The proposed project won’t use any real person, as an entity or collect their data.
It is just used by user and not even a single piece of user information is stored
hence there is no ethical issue or data protection issue for this project.
The proposed project is not an AI agent hence the application isn’t supposed to
enhance its recognition ability upon its use and it is not intended to learn new
words by itself. It is not supposed to open every single application, only selective
applications like: Google chrome, calculator, paint, Photoshop are likely to work.
The application will be provided with crib sheet (quick reference). It won’t be
able to understand or perform extra functionalities other than in crib sheet.
The proposed system is supposed to make the use of Java SE for the speech
recognition technique. The project is supposed to be based on MVC architecture.
The open source library Sphinx-4 is supposed to be used for getting some
6
assistance for speech recognition technique. “Sphinx-4 is a pure Java speech
recognition library. It's very flexible in its configuration, and in order to carry out
speech recognition jobs quite a lot of objects depending on each other should be
instantiated” (sourceforge.net, 2014)
7
The use case diagram is the diagram that is used to show the interaction of the
user to the system. It is used to determine the possible work of user in the
system. The stick figure represents the actor and the box represents system
boundary. The tasks inside box represent the use cases. I’ve used use case
diagram for this project it in order to limit the scope of the system.
During this project there might be lots of challenges, which can be solved by
different techniques such as problem solving technique, Analytical skills, critical
review design decision and innovative thinking.
The technical challenges of the projects have been shown with the help of
problem solving technique (SWOT)
8
SWOT: SWOT is one of the problems solving technique, which allows us to find
out the possible strength, Weakness, opportunities and threats of the project we
are going to start. It allows us to find out what can be put the advantages of
projects we are doing and what might put our projects at risk.
9
Deliverable
At the end of the project, completely working speech recognition bot with
capability of answering users queries is supposed to be built. This bot is also
expected to have functionality of controlling the users computer.
The following source files are expected at the end of the project in order to
create a speech recognition bot
|- SpeechRecogBot.Java
|- Grammar.java
|- SpeechRecogBot.config.xml
Working application and User Manual: This part of documentation will consist of
completely working java files and class files along with user manual for
helping new users of system.
Project Plan
The proper estimation of the project timeline has been shown below along with
Gantt chart. This is only the rough estimation so this might be changed during
the project course.
10
Figure 3: Project Schedule
11
Figure 4: Gantt chart
12
References
Admin (2014). sourceforge.net [online]. Available from:
<http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4>. [Accessed 4th
October].
Brajonnz (2011). iPhone has problems with Scottish accents. [online]. Available
from: <http://www.youtube.com/watch?v=1EVnK-pNxaA>. [Accessed 4th Oct].
Eric Mack (2014). Google reveals our embarrassing voice search habits. 14th
October. Cnet [online]. [Accessed 15th October]. Available from:
<http://www.cnet.com/news/google-voice-search-siri-cortana-teen-study/>.
13